[
    
    
    
    {
        "title"    : "Can you reverse engineer our neural network?",
        "date"     : "February 24, 2026",
        "authorId" : "richeng",
        "author"   : "Ricson Cheng",
        "tags"     : [],
        "minsToRead" : 15,
        "content"  : "A lot of “capture-the-flag” style ML puzzles give you a black box neural net, and your job\nis to figure out what it does. When we were thinking of creating our own ML\npuzzle early last year, we wanted to do\nsomething a little different. We thought it’d be neat to give users a complete\nspecification of the neural net, weights and all. They would then be forced to use the\ntools of mechanistic interpretability to reverse engineer the network—which is a\nsituation we sometimes find ourselves facing in our own research, when trying to interpret\nfeatures of complex models.\n\nWe published the puzzle last February. At the time, we weren’t even sure it was\nsolvable. The neural network we’d designed would output 0 for almost all inputs. A\nreasonable solver might assume that the goal was to furnish an input that produced 1 or\nsome other nonzero value. But we’d engineered the network in such a way, as you’ll soon\nsee, that you couldn’t use traditional methods to brute force your way to an answer—say,\nby backpropagating a nonzero output all the way back to the input layer. You had to\nactually think about what the net was doing.\n\nWe were amazed by the response the puzzle got. Mostly by luck, it seemed like we’d\ncalibrated the difficulty just so: it wasn’t so hard that no one could solve it, and\nwasn’t so easy that we were flooded with responses. In fact if you can solve this puzzle,\nthere’s a decent chance you’d fit in well here at Jane Street.\n\nWe’ll restate the problem below, but be warned that the rest of this post contains huge\nspoilers. If you want to try solving the puzzle yourself, avert your eyes. The rest of\nthis post will walk through the process that an actual solver took, with all the twists\nand turns before they finally cracked it.\n\nThe problem\n\n\n  Today I went on a hike and found a pile of tensors hidden underneath a\nneolithic burial mound! I sent it over to the local neural plumber,\nand they managed to cobble together this.\n\n  model.pt\n\n  Anyway, I’m not sure what it does yet, but it must have been\nimportant to this past civilization. Maybe start by looking at the\nlast two layers.\n\n  Model Input\n\n  vegetable dog\n\n  Model Output\n\n  0\n\n  If you do figure it out, please let us know.\n\n\nThat model.pt file is basically just a pickled PyTorch model.\n\nA solution\n\nGetting started\n\nA senior at university named Alex was in his dorm room when a roommate\ntold him about a puzzle that was making the rounds on Twitter. The\nroommate had tried it himself but given up after two nights. Alex, in\nhis final winter at school, was looking for something to do and decided\nto have a look.\n\nHe started by downloading the model and poking around, focusing on the\nlast layer in particular:\n\nimport torch\nimport plotly.express as px\nmodel = torch.load('./model.pt')\nlinears = [x for x in model if isinstance(x, torch.nn.Linear)]\npx.imshow(linears[-1].weight.detach())\n\n\n\n\n\nImmediately it was plain that this was not an ordinary neural network.\nIt clearly hadn’t been trained: all the weights had integer values.\nInstead, it had been designed by hand, probably to carry out some very\nspecific computation.\n\nThe last layer was a 48x1 matrix, but apparently broken into three\nsections. And indeed if you looked at the activations from the previous\nlayer, they were always three repetitions of the same thing. The\nsecond-to-last layer appeared to be three repetitions of the same\nweights, while its bias contained the same 16 bytes, but incremented by\n1 each time, as if encoding a vector v, then v + 1, and v + 2. Here’s\nwhat the weights on that second-to-last layer looked like:\n\npx.imshow(linears[-1].weight.detach())\n\n\n\n\n\nand the biases:\n\npx.imshow(linears[-2].bias.detach().unsqueeze(0))\n\n\n\n\n\nThinking about it some—and about the fact that the last layer emitted\na single bit—Alex realized that this second-to-last ReLU layer must be\ncomputing whether two 16-byte integers were equal to one another (with\none byte per neuron). The way it seemed to work is that it made three\ncopies of the input vector v, a 16-byte number. It tried to check that\nagainst a reference number x (which was determined by the bias of the\nsecond-to-last layer). So the three copies would actually represent v -\nx - 1, v - x, and v - x + 1. The last layer applied weights 1, -2, and 1\nto these cases respectively. We can do some casework on an individual\nvalue here: consider the value of ReLU(v-x-1) - 2ReLU(v-x) +\nReLU(v-x+1). If v=x, then this is equal to 1. We won’t show the rest of\nthe cases here, but they all result in 0. The bias on the last layer was\n-15, so the final neuron would only fire when v=x for all 16 bytes.\n\nSo now the question became, how do we get the activations of the second\nlast layer to equal x?\n\nReverse-engineering the program at the heart of the network\n\nAlex figured that if there’s some number that the network is checking\nagainst at the very end, then the rest of the network must be some sort\nof big equation. There indeed appears to be a lot of structure in the\nnetwork, as you can see just from plotting the size of the 2500 linear\nlayers (about half the full network):\n\npx.line([l.out_features for l in linears])\n\n\n\n\n\nSo Alex began looking at various sub-networks, tracing their\ndependencies. This involved staring at a lot of graph structures:\n\n\n  \n  \n\n\nBut after hours of searching for legible sub-circuits, he came up short.\nFor the moment there just seemed to be too much complexity to trace by\nhand. So he had a new idea: what if I treat this thing as a linear\nprogram and just solve it?\n\nThis is, of course, not possible with so many ReLU layers—ReLUs aren’t\nlinear—but they can be modelled by adding an additional integer\nvalue, corresponding to the statement “this activation is negative.” You\ncan thereby treat it as an integer linear program and use a constraint\nsolver capable of integer programming. So that’s what Alex did: he\ndutifully wrote some code to convert the layers of the neural network\ninto a giant linear program and let it run.\n\nAnd let it run.\n\nThat seemed to be going nowhere—so Alex now attempted to reduce the\nnumber of variables in the program. Perhaps there were some reductions\nyou could do? Alex found that if you looked at a bunch of layers, they\nmostly looked like identity matrices. In fact in 1500 or so layers, 80%\nof the nodes were just performing an identity operation.\n\nAlex treated each neuron in the network as a node in a DAG, where each\nnode goes into the nodes in the next layer with some weights; but if you\never have a node with in-degree 1 and whose weight is exactly 1, you can\ncombine those two nodes. (You know this is safe to do because the\nnetwork has integer values everywhere: all the inputs are integers, as\nare all the weights.)\n\nThere were slightly fancier reductions. For instance, if you have a node\nwhose every incoming edge has positive weight, then the fact that you’re\ndoing ReLU doesn’t matter, because it’s never going to hit the negative\nclamp—and so you can forward its in-edges to its children, directly\npassing them to the next layer. Also, if two neurons in a layer have\nexactly the same input vector, you can combine them, and redirect their\ndescendents to the new merged neuron. And you can repeat this process\nmany times.\n\nAlex by now had poured hours into this analysis. He’d found circuits\nthat appeared to be repeated across many layers. He’d print out\ndifferent equivalence classes of nodes, looking at the sequence of\nweights that each node had as input, discovering that there were only a\nfew kinds of nodes. For instance there was one class of nodes which\neffectively would forward a value from two layers back. Collapsing\nthese, among other similar reductions, brought down the size of the\nlinear program from something like 2 million nodes, to 75,000.\n\nBut after all that, Alex ran the solver again and again it churned\nwithout terminating.\n\nThe final reductions\n\nA new idea: what if you propagated bounds through the network? Just by\nreasoning through one layer at a time, you could figure out the maximum\nvalue that any given node could achieve; you’d do this simply by looking\nat the bounds on its inputs. It turns out that with fairly conservative\nassumptions, many nodes end up with very tight bounds, e.g. from 0-1.\nMaybe this was enough to make the program tractable?\n\nAt this point Alex switched from a linear program to a SAT solver, since\nthe total number of values had gotten so much smaller. In the SAT\nversion, you had a boolean variable for each node equalling each value\nin its range. All told this resulted in 200,000 variables after all the\nreductions. After a day of running, the SAT solver reduced the program\nto 20,000 variables. From there it didn’t seem to reduce further.\n\nIn effect Alex had discovered that inside this neural network there was\na core program, irreducibly complex, that—much to his\ndisappointment—was still too large to brute force. So after many days,\nhe had to take a step back, effectively having gotten nowhere.\n\nGlancing off the solution\n\nHe thought meta: this has to be a solvable puzzle, right? How would\nsomeone build a puzzle like this that would be interesting to solve?\nIf you generated random weights, a SAT solver would probably be able to\nsolve it by brute force. This network was created by a human. At its\ncore there seemed to be a function that you couldn’t just use search or\noptimization to recover. It was an irreversible function. What were\nsome go-to examples of irreversible functions?\n\nAlex asked ChatGPT for some common hash functions, and compared them\nagainst some basic plots of the layer widths, which looked periodic. In\nfact there were 32 periods of length 48, repeating exactly each time.\nMaybe the network was doing 32 blocks of the same computation? To\nChatGPT again: are there any common hash functions that use 32 blocks of\ncomputation? Bingo. It turned out that roughly all of them do.\n\nTo determine which one was in play here, he explored by hand: he’d input\nsome string into the network, compute various hash flavors with separate\nprograms, then look at the second-to-last layer. It turned out that md5\nlined up and the other common hash functions didn’t.\n\nThis was nice, because he already knew what the hash was supposed to be\nby looking at the second-to-last layer’s biases. So the problem reduced\nto finding an input string that produced that particular md5 hash. But\nit was not obvious how to solve that—especially since he didn’t have a\nreal proof that this network always produces an md5 hash. Maybe the\nsolution was to dig deeper, and hack the network to make it reversible?\n\nA glitch in the matrix\n\nAlex noticed something odd in the network. It seemed to have a bug: if\nyour input was greater than length 32, it no longer produced the correct\nmd5 hash. Perhaps somewhere in that bug was a key to reversing the hash\nvalue that was built into the network?\n\nHe spent the next two days reverse-engineering the bug. To start, he got\nGemini to write an implementation of the md5 hashing function. Then he\nmatched up every neuron in the network to the corresponding variable in\nthe md5 algorithm. He wrote some code that would store the sequence of\nvalues for a given intermediate variable, then search each of the 32\nblocks in the network for that value; this would pick out which ranges\nof neurons corresponded to the bits for each variable. It turned out\nthat some ranges of bits exactly corresponded to the variables, and\nothers were intermediate computation values.\n\nThen, with inputs that were &gt;32, he could painstakingly trace through\nthe blocks to find the exact spot where the network diverged from the\ncorrect algorithm.\n\nThe crux of it was in the first 7 layers—there was a circuit that\nwould compute the length of the input, and attempt to store it in 4\nbytes, in little-endian order. But when the length was 256 bits or\ngreater, you’d have a length variable that contained the value 256,\ninstead of the correct encoding. That is, if the length were &gt;384 bits,\nthe length bytes should be 128 1 0 0, but what the network encoded\ninstead was 384 0 0 0.\n\nThen the question was, is it possible to exploit this bug, by crafting a\nmessage of length 256 or greater? Some more painstaking tracing revealed\na few observations: First, there aren’t that many possible lengths.\nThere were only 55 inputs, so he could do an exhaustive search to see\nhow the network behaved with respect to these weird values. Second, the\nbroken length value was converted to binary, and then propagated through\nevery layer in the entire network. In binary, all of the bits would\nequal 1, and the rest of the number was concentrated in the lowest-order\nbit, so 384 would be encoded as 130,1,1,1,1,1,1,1. Third, the invalid\nbytes from the length of the message were only used in a few blocks of\nthe md5 computation, which always reads bytes from the input in the same\norder.\n\nUsing these observations, it’s possible to write down a modified version\nof the md5 algorithm, which corrects itself at the necessary blocks to\nbe in line with the neural network. Looking closely at this, however, it\nstill seems very difficult to reverse in general.\n\nThis took about two days to figure out, but—disappointment\nagain—didn’t lead Alex any closer to the solution. He wrote to the\nemail address provided on the puzzle with what he’d discovered so far.\nWhat he heard back surprised him. The bug was not intentional. With that\nin mind, why don’t you try to solve it one last time?\n\nThe return of brute force\n\nIt turned out that once you knew the hash encoded in the bias of the\nsecond-to-last layer, you were done. Figuring that out was the meat of\nthe puzzle. The puzzle creator had intentionally made the hash easy to\nbrute force, leaving various small hints in the puzzle description and\nPython code that the solution was composed of two English words,\nlowercased, concatenated by a space.\n\nAlex had actually tried to brute force the hash earlier, but had\ndownloaded a list of the top 10,000 most popular words to do it, which\nturned out not to be big enough to find it. Once he had a big enough\nword list, he got the answer.\n\nAnother puzzle\n\nOne of the things that made this puzzle challenging was designing a\nnetwork of the right complexity. Using logic gates means the network\nwon’t be differentiable; but if you make the program encoded by those\ngates too complex, there’d be little hope of reverse engineering it. Md5\nfelt like a good compromise, though it was by no means trivial. Because\nmd5 uses modular addition, creating the puzzle required implementing a\nparallel carry adder in 20ish layers of a neural network. Not easy! We\nwere impressed that some solvers managed to figure that out—and Alex’s\ndiscovery of the &gt;32 bug was unexpected and quite extraordinary.\n\nThe experience of creating and releasing the puzzle, and engaging with the folks who\nsolved it, went well enough that we’ve done it again. Here you’ll find the\nlatest. In this new puzzle,\na neural network whose layers have been jumbled up needs to be put back in the right\norder… Can you help?\n\nIf this kind of thing is interesting to you, consider\napplying. You’ll join a\nclose-knit group of brilliant, supportive colleagues, harnessing tens of thousands of\nGPUs, petabytes of training data, and the agility and resources to invest in the best\nideas.\n",
        "url"      : "https://blog.janestreet.com/can-you-reverse-engineer-our-neural-network/",
        "image"    : "https://blog.janestreet.com/can-you-reverse-engineer-our-neural-network/puzzle.png",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Results from the Advent of FPGA Challenge",
        "date"     : "February 11, 2026",
        "authorId" : ["asinghani","bdevlin"],
        "author"   : null,
        "tags"     : [],
        "minsToRead" : 6,
        "content"  : "At the end of last year, we decided to try something new: a\nchallenge that would run\nalongside Advent of Code, where we asked the community to\nshow us how they could design hardware to solve the same problems. We had no idea what\nlevel of participation to expect, but we received a huge number of submissions, many of\nwhich were incredibly creative!\n\nBefore we get into the top submissions, here are statistics from the competition.\n\nOverall we received 213 submissions, with 46 countries represented. The most came from the\nUnited States (28%), followed by India (23%) and the UK (11%). We also saw a wide range of\nparticipants: 59% were university or PhD students, while 32% were industry professionals\n(including several retired engineers). We were impressed to see participation from a few\nhigh school students as well.\n\nAt Jane Street we use Hardcaml every day, so it was exciting to also see how many\nsubmissions gave Hardcaml a try (we received 151 submissions that attempted at least one\nproblem in it). As expected, Verilog, SystemVerilog, and VHDL were all very\nwell represented, but on top of this we loved seeing the huge variety of other languages\nin the submissions: Chisel, Spade, BlueSpec, Veryl, Clash (including a very nice\nwriteup by Tristan de Cacqueray), Spinal HDL,\nPipelineC, and even someone’s own experimental HDL\nlanguage.\n\nSome problems lent themselves to hardware implementation more easily than others. Days 1\nand 3 were the most popular, both giving rise to “greedy algorithms” implemented using\nstate machines. Days 4 and 7 were natural fits for hardware, using shift-register tricks\nto maximize throughput. On the other hand, days 8 and 11 both required classical graph\nalgorithms, which took some clever tricks to translate into hardware (as seen in some of\nour featured solutions).\n\nJudging the top submissions was really difficult—there were so many great ones. In the\nend, we looked for submissions with detailed write-ups that others can reference and learn\nfrom, as well as some trickier solves that took the solution further in some dimension,\nlike demonstrating results on actual hardware or using unexpected implementation\nbackends. In no particular order, these were our favorites:\n\nEric Pearson [Writeups]\n\n\nEric, a retired engineer from Canada, was one of the few participants to solve both parts\nof all 12 days. Working in SystemVerilog, he ran his designs across multiple platforms\nincluding ASIC flows, Altera, and Xilinx toolchains. His submission includes video\ndocumentation showing real-time solutions running on hardware—the Day 8 Part 2 video\ndemonstrating the solution unfolding in real\ntime was a\nparticular highlight.\n\nFrans Skarman [GitLab] [Website]\n\nFrans, a postdoc researcher at Munich University of Applied Sciences, solved all 12 days\nentirely in Spade, a Rust-inspired HDL that he previously\ndeveloped. He set himself the additional constraint of doing all parsing in hardware:\nevery solution receives only raw input bytes and an EOF signal. His Day 10 solution is\nparticularly creative: it generates Spade code from the puzzle input, then uses formal\nverification (bounded model checking) to find the shortest button sequence. After running\nthe formal tools, the depth at which the assertion fails gives you the answer. Day 11 is\nso computationally intensive it requires nearly 64GB of RAM just to simulate! Frans also\nran his designs on a ULX3S board with UART I/O, proving they work on real hardware.\n\nJosef Gajdusek [GitHub] [Blog Post]\n\n\nJosef, a hardware engineer from the Czech Republic, took a retro approach to the\nchallenge: he built a physical PCB with 125 discrete 74-series logic chips to solve Day 1.\nHe developed a clever pipeline that runs a Hardcaml design through Yosys and NextPNR, then\nuses a set of Python scripts to generate and wire up a PCB layout in KiCad. Despite the\nholiday timeline, he was able to get the PCBs manufactured and tested by the end of the\ncompetition! While he only tackled one day, the novelty of synthesizing Hardcaml down to\ndiscrete TTL logic on a custom PCB made this one of our favorite submissions.\n\nMatthieu Michon [GitHub]\n\nMatthieu, a professional engineer from France, built an elegant framework for solving the\npuzzles on real FPGA hardware using only the JTAG interface. His designs are\nboard-agnostic and run on any Xilinx 7-series FPGA with sufficient density. What we\nparticularly appreciated was his honest documentation of the debugging process: he\nencountered and resolved simulation/synthesis mismatches, and even opened a Vivado support\nticket for a bug that he discovered in the JTAG TCL commands. His submission demonstrates the\nreal-world engineering challenges of getting hardware designs working correctly, not just\nin simulation.\n\nRémy Citérin [GitHub]\n\nRémy, a university student at École Normale Supérieure in France, used Bluespec to solve\nmultiple days—and then went further by benchmarking his hardware implementations against\nsoftware running on his own custom out-of-order RISC-V CPU. The Day 10 Part 2 solution is\nespecially impressive: he implemented Gauss-Jordan reduction with careful numerical\nstability handling (using GCD-based normalization to keep integers at 16 bits), followed\nby brute-force search over non-basic variables. His detailed README walks through the\nalgorithmic transformations from recursive Python prototypes to explicit-stack Bluespec\nimplementations. It’s a great reference for hardware algorithm design.\n\nRobert Solomon Saab [GitHub] [TinyTapeout]\n\n\nRobert, an electrical engineering student at the University of Toronto, took the challenge\nall the way to silicon. His Day 12 solution implements a clever pseudo-recursive\nbacktracking algorithm that tracks placed shapes and orientations (in fact, solving a much\nmore general version of the problem than was required to complete it on Advent of Code).\nWhat makes this submission stand out, though, is that Robert took his Hardcaml design and\nactually taped it out as an IC through TinyTapeout. His design was deliberately\narchitected for minimal hardware usage to fit within TinyTapeout’s constraints while still\ntaking full advantage of the hardware.\n\nSatya Jhaveri [GitHub]\n\nSatya, a professional engineer in Australia, produced one of the most comprehensively\ndocumented submissions of the competition—covering all 12 days with detailed algorithm\nexplanations, complexity analysis, and implementation notes. Beyond the documentation, his\nsolutions feature clever algorithmic insights: an O(1) formula for counting invalid\nnumbers in Day 2, a heuristic to avoid storing O(N²) edges for Day 8’s dense graph, and a\npipelined approach that reduces Day 9 from O(N³) to O(N²k). He also implemented a CSR\ngraph representation with disjoint set union in Verilog for Day 11. For anyone looking to\nlearn how to approach these problems in hardware, Satya’s repository is an excellent\nreference.\n\nWhat started as a simple challenge to solve Advent of Code puzzles in hardware turned into\nsomething far more creative and impressive than we expected: custom CPUs, neural networks,\ndiscrete logic PCBs, and actual silicon. The range of approaches—from a high school\nstudent’s first FPGA project to a retired engineer’s multi-platform validation\nsuite—shows just how wide this challenge spread. Thanks to everyone who participated,\nand especially to those who documented their journeys for others to learn from. We’re\nalready thinking about how we could challenge everyone next year! Stay tuned for more.\n",
        "url"      : "https://blog.janestreet.com/advent-of-fpga-challenge-2025-results/",
        "image"    : "https://blog.janestreet.com/advent-of-fpga-challenge-2025-results/hardcaml-advent-of-fpga-results.jpg",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "I design with Claude more than Figma now",
        "date"     : "February 5, 2026",
        "authorId" : "emorris",
        "author"   : "Edwin Morris",
        "tags"     : [],
        "minsToRead" : 6,
        "content"  : "For a long time I was skeptical of LLMs—whenever I reached for them I was disappointed by\nthe results. Last year I tried Copilot and Cursor to tweak a game I’d built, and neither\ngenerated working changes. At a previous job I tried Gemini to outline product briefs and\ngenerate wireframes, but ended up throwing them all away. Every time I tried LLMs it was\nfor something I was already good at, and they did a worse job than I would have.\n\nHaving joined Jane Street this past summer, I’m finding AI support indispensable. There’s\njust so much that’s new to me, and so much I’m not good at yet, like\nOCaml and\nBonsai. But one big surprise is how much it’s\nchanged the thing I’m best at: my design workflow.\n\nInstead of laboring over spec docs, building Figma mockups, writing proposals, and\nreviewing the implementation with devs, I find myself building prototype features that\njust do the exact thing I have in mind. What that looks like in practice is:\n\n\n  Write something describing the problem and my proposal\n  Open my editor, start a build, the server, and Claude, using that description I wrote as the prompt\n  Get the basic functionality working to prove to myself that it’s possible\n  Iterate on that as much as I want\n  Push changes to a development environment and ask users what they think\n  Submit a feature (our\nversion of a pull request) that looks and behaves exactly the way I want\n\n\nA prototype feature in the actual codebase has felt better in almost every way compared to\nmockups and docs. Take a prototype I made recently that added LLM prompting to a JSQL\ninput (JSQL is an internal SQL dialect that we use for lots of different user-facing\ntools). This prototype really works, and I spent days living with it and testing\nit. Claude gave me free, unlimited iteration, unbothered when I changed my mind for the\n50th time or asked for a small tweak. I refined the Submit button, added keyboard\nshortcuts, tweaked copy, adjusted the prompt, and added generated confirmation\nmessages. These are workflow improvements that would have taken days or weeks of\nengineering and design back-and-forth at my previous job, or more likely would just never\nhave happened.\n\n\n\nAll the effort spent on this feature went into improving the real artifact, and none on\nancillary in-between work like creating Figma components or formatting docs.\n\nIt took me a while to arrive at this workflow. When I joined last summer, I only\napproached smaller-sized tasks with AI, like UX papercut fixes. For bigger ideas I was\nstill using Figma and docs, and when I tried making those things with Claude it\nfailed.\n\nBut in the past 2 months the situations where I’ve reached for Figma have fallen off a\ncliff. Through some combination of improved models, my own facility with them, and\ncarefully choosing the right scope, AI is now working for big stuff too—not just the JSQL\nprompt but a half dozen other prototypes that make user-facing, data model, and library\nchanges, including some that are 2000+ line diffs; I’m using it to implement interactive\nprototypes for brand new apps after designing them in Figma; and for some new apps I’m\neven skipping Figma entirely, iterating on the visual design from the beginning with\nClaude.\n\nAs a designer this has been empowering. Engineers have the ability to create working\nproofs of concept when they have an idea. Designers have to convince other people to do\nthat for us. For an idea like “direct LLM prompting in the JSQL input” I’d be proposing\nsomething whose feasibility is not even clear at the outset; getting someone to build a\nprototype might waste their time. In other cases I might propose something that doesn’t\nclearly fill a user need. By using Claude to make these ideas real I’m making it a lot\neasier for others to evaluate them—they can just use it.\n\n\n\nBut there’s a downside: in this workflow, the reviewer is given a fully baked\nfeature. Does that mean they have zero input on the functionality and are just supposed to\nreview the code? Review is not the most fun work—the equivalent in the design world would\nbe getting a detailed wireframe from a PM and being asked to make it look good. I want to\nmake my proposal as clearly and completely as possible, but I still want my engineering\nteammates to treat it the same way they’d treat a mockup in Figma, as something they and I\ncan iterate on together in design-space.\n\nOur solution for now is just to think about these features differently. I write a short\nreminder in the description: prototypes are living proposal docs, the code is disposable,\nand a reviewer’s job is to give feedback about the design and user experience. Eventually,\nreviewers still take over the idea and implement it in a separate feature, referencing the\nprototype but owning the production code. In practice we’re still figuring out what makes\nsense and feels good with this new workflow.\n\nThere’s also a fear I have that designing with Claude keeps me out of a fluid, creative\nmindset and stuck in an iterative one, constrained to the outcomes I think Claude can\nproduce. That’s fine for mature tools, where changes are iterative, but might mean I miss\nideas when working on something new.\n\nThis is a familiar tension. When I was getting started professionally in 2011 there was a\nlot of discourse about whether designers should\ncode. Critics argued that once\nyou’ve started programming you’re less likely to make big changes to an idea. But I liked\nmaking websites, and I liked programming, so I kept writing code. Then, when frontend\nframeworks like React became common and frontend development got more complicated, like\nothers I decided to specialize. I still made personal projects in React—that certainly\nhelped me interact with devs—but I spent almost all my time at work in Figma and docs.\n\nHad I joined Jane Street before LLMs, I think I would have become even more entrenched in\nFigma. With JavaScript I at least have some experience; OCaml and Bonsai are entirely new,\nand contributing on a technical level would have felt out of reach. Instead I’m back to\nmaking the real thing, and it feels amazing to be working in the medium again. I feel more\nfree than ever to just try things.\n",
        "url"      : "https://blog.janestreet.com/i-design-with-claude-code-more-than-figma-now-index/",
        "image"    : "https://blog.janestreet.com/i-design-with-claude-code-more-than-figma-now-index/figma-to-claude-hero.png",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Fun with Algebraic Effects - from Toy Examples to Hardcaml Simulations",
        "date"     : "January 6, 2026",
        "authorId" : "fquah",
        "author"   : "Fu Yong Quah",
        "tags"     : [],
        "minsToRead" : 29,
        "content"  : "I recently ported the Hardcaml_step_testbench\nlibrary, one of the libraries that\nwe use at Jane Street for Hardcaml simulations, from using monads to using algebraic\neffects, a new OCaml 5 feature. This blog post walks through what algebraic effects are,\nwhy you should consider using them in lieu of monads, and how to actually work with them\nusing the Handled_effect library. One\nthing I’ve come to believe is that most of what can be done with monads can be done with\nalgebraic effects in a much more elegant way.\n\nAlgebraic effects were originally added to OCaml for general-purpose concurrent execution\nof programs for OCaml 5, which supports thread-level parallelism. The fact that they can\nbe repurposed for Hardcaml simulations speaks to how well-thought-out and general a\nlanguage feature this is.\n\nI am writing this post as someone who is not a type-theory expert. The fact that I can\nuse algebraic effects without fully understanding the underlying\nmechanics is one nice feature of their\ndesign.\n\nWhat’s wrong with monads?\n\nMonads have been used by OCaml programmers for a long\ntime to model computation. Jane Street’s own monadic Async\nlibrary, which is used for\nconcurrent programming, powers a lot of our infrastructure. Why would we want to replace\nit?\n\nReason 1: Monads infect your code and make everything harder to read\n\nRecall that monads have the following type signature:\n\n(* This is part of the [Monad.S] module type *)\ntype 'a t\n\nval return : 'a -&gt; 'a t\nval bind : 'a t -&gt; f:('a -&gt; 'b t) -&gt; 'b t\nval map : 'a t -&gt; f:('a -&gt; 'b) -&gt; 'b t\n\n\n\nSuppose we are interacting with a server that helps us accumulate numbers. Using the\nDeferred.t monad from Async as an example, an interface might look something like this:\n\ntype txn\n\nval start_transaction : unit -&gt; txn Deferred.t\n\nval send_number : txn -&gt; int -&gt; unit Deferred.t\n\nval wait_for_completion : txn -&gt; int Deferred.t\n\n\n\nYou could use these in a bit of code like so:\n\nlet send_numbers_to_server n =\n  let%bind txn = start_transaction () in\n  let%bind () =\n    Deferred.for_ 0 ~to_:(n - 1) ~do_:(fun i -&gt;\n      let%bind () = send_number txn i in\n      printf \"Sent number %d\\n\" i;\n      return ())\n  in\n  let%bind result = wait_for_completion txn in\n  printf \"Transaction done: %d!\" result;\n  return ()\n;;\n\n\n\nFor those not familiar, let%bind x = foo () in bar () is some syntactic sugar that is\ntransformed into bind (foo ()) ~f:(fun x -&gt; bar ()).\n\nThe Async monad is all over this code. let%bind () = surrounds operations that perform\nan asynchronous operation; you need the noisy return () in a bunch of places to ensure\nyou finish on a Deferred.t type; and you have to use special versions of core library\nfunctions like Deferred.List.iter—or here, Deferred.for_—instead of the normal versions,\nbecause you must return a Deferred.t. Once you start using a monad, you’re effectively\ntrapped in it. All code that interacts with your usage must also consider the monad. It’s\nannoying.\n\nIf we had a hypothetical Async library built with OxCaml algebraic effects instead, it\nmight look something like this:\n\nlet send_numbers_to_server (h : Deferred.Handler.t) n =\n  let txn = start_transaction h in\n  for 0 to n - 1 do\n    send_request h txn i;\n    printf \"Sent number %d\\n\" i;\n  done;\n  let result = wait_for_completion h txn in\n  printf \"Transaction done: %d!\" result;\n;;\n\n\n\nNo special let% ppx; no returns; no monad-flavored standard library functions; and the\noverall function returns a normal value that callers don’t have to treat specially.\n\nReason 2: You can’t use unboxed types and the local mode\n\nMonads make it tricky to use some valuable OxCaml features, notably unboxed types and the\nlocal mode.\n\nConsider the following piece of code:\n\nopen Core \nopen Unboxed\n\n(* Assume [Bar] is some module that implements Monad.S and some utility functions *)\nmodule Bar : sig\n  include Monad.S\n  \n  val do_stuff : unit -&gt; unit t\nend\n\ntype foo =\n  { a : int\n  ; b : int\n  }\n\nlet do_thing () =\n  let foo @ local = { a = 1; b = 2 } in\n  let%map x = Bar.do_stuff () in\n  F64.of_int (foo.a + foo.b)\n;;\n\n\n\nThe above code will not compile! To understand why, let’s consider what the code looks\nlike when we run it through the PPX preprocessor:\n\nlet do_thing () =\n  let foo @ local = { a = 1; b = 2 } in\n  Bar.map (Bar.do_stuff ()) ~f:(fun () -&gt;\n    F64.of_int (foo.a + foo.b)\n  )\n;;\n\n\n\nOxCaml introduces the notion of\nlayouts to types and\nmodes to values. There are many kinds of\nlayouts corresponding to the memory representation of the type, which is out of scope\nhere. In this case, we’re using a layout that’s backwards compatible with OCaml’s default\nmemory representation, namely the value layout. Modes on the other hand track the\nproperties of values. Here, the mode simply tracks whether a value is allocated in a\ncaller’s stack (it only lives in the caller’s region), or on the global heap (it is\nmanaged by the garbage collector).\n\nOnce you add in the layout and mode annotations, our Bar.map has the following\nsignature:\n\nval map\n  : ('a : value) ('b : value) .\n    'a t @@ global -&gt; f:('a -&gt; 'b) @@ global -&gt; 'b @@ global \n\n\n\nThe two problems here are:\n\n\n  The foo record is locally allocated. This means that f cannot capture it in its closure\nenvironment, since f has to be globally allocated.\n  The return value of the closure, f, has the layout value. F64.t, which is the return type\nof F64.of_int, unfortunately has the type float#, which is not compatible with the\nlayout value.\n\n\nThe Tools & Compilers group at Jane Street have some work in\nppx_let that tries to get around this somewhat by\nallowing f to be a locally allocated closure. This is a good enough solution for a lot\nof cases, but not where you simply cannot have a locally allocated closure. One such\nexample is Async, which requires the closure to be\nglobally allocated, as the closure itself is passed to the Async scheduler as tasks to be\nscheduled. Furthermore, it only actually solves the first of the two problems above.\n\nEffects are a neat solution to this. They circumvent the need for globally allocated\nclosures, as they shift the task of managing the “environment” of a particular context\ninto the language runtime. This conveniently allows us to use various new fancy OxCaml\nfeatures.\n\nEffects have other upsides, like better stack traces and more seamless composability. (If\nyou’ve ever tried to compose two monads together you will know how painful it can be.)\n\nA layman’s introduction to Algebraic Effects\n\nEffects can be thought of as primitives built into the language that allow you to suspend\nexecution in the control flow of a piece of computation and yield control to a scheduler /\nruntime. When you “perform” an effect (more on that later), execution pauses and the\nhandler receives a continuation k—a first-class value representing “everything that\nwas going to happen after the perform.” From there you can continue with the underlying\nvalue (the way you might after a let%bind in the Async monad), throw, or hold onto the\ncontinuation for later.\n\nWe will walk through some simple examples, before going through more\ncomplicated examples for Hardcaml simulations.\n\nIf you are vaguely familiar with OCaml effects in the OCaml\nmanual, you will notice that things look\nquite a bit different. The type-safe OxCaml effects API differs from the stock OCaml\neffects API.\n\nLet’s start with a simple example to understand the API provided by Handled_effect.\n\nLet’s say we want to build a library that allows us to perform a trivial computation that\neither increments or decrements a value. We want to end up with code like this:\n\nlet computation (handler : E.Handler.t @ local) =\n  let x = 1 in\n  let y = E.perform handler (Plus_one x) in (* suspends here *)\n  let z = E.perform handler (Minus_one y) in (* suspends here *)\n  print_s [%message (x : int) (y : int) (z : int)]\n;;\n\nlet%expect_test \"\" =\n  run_computation computation;\n  [%expect {| ((x 1) (y 2) (z 1)) |}]\n;;\n\n\n\nThis is the core “business logic”: it doesn’t know how Plus_one or Minus_one are\nimplemented. We’re just setting up the flow of the computation. In monadic Async terms,\nthe code above is like the code the user writes that’s peppered with let%binds. The\nactual performing of the computation (which we’ll get to below) is like what the Async\nscheduler does behind the scenes with that user code.\n\nThe E.Handler.t value, called an “effect handler”, can be viewed as an object that you\npass around to access the implementation for the effect E.  (Notice that we have\nannotated the handler argument with @ local.  For now, just take my word for it that it\nneeds the local annotation for type-safety purposes. I will elaborate on why at the end of\nthe blog post.)\n\nLet’s break down the other parts. You start by defining the possible effect\noperations. This is a GADT that\nspecifies the possible operations that your computation can perform that will suspend\nexecution. It can look something like this:\n\nopen Core\n\nmodule Effect_ops = struct\n  (* One of the type arguments must specify the return \n     type of performing the effect; here, an int *)\n  type 'a t =\n    | Plus_one     : int -&gt; int t\n      (** An operation that when given [x], return [x + 1] *)\n    | Subtract_one : int -&gt; int t\n      (** An operation that when given [x], return [x - 1] *)\nend\n\n(* We invoke the Handled_effect `Make` functor on our \n   module to Effect-ify it. *)\nmodule E = Handled_effect.Make (Effect_ops)\n\n\n\nThe module E is the primary module users will interact with when working with effects.\nThere are many functions and types in there, which we will walk through in the example.\nBut below is a heavily simplified interface of the E module that we just constructed:\n\nmodule E : sig\n  module Handler : sig\n    type t\n  end\n \n  (** Performs an operation in the context of running a computation. *)\n  val perform : Handler.t @ local -&gt; 'a Effect_ops.t -&gt; 'a @ once unique\n \n  (** Evaluates a computation *)\n  val run : (Handler.t @ local -&gt; 'a) -&gt; 'a Result.t\n  \n  module Result : sig\n    type ('a, 'e, 'es) t =\n      | Value : 'a -&gt; ('a, 'e, 'es) t\n        (** This is returned when the computation is finished. *)\n      | Exception : exn -&gt; ('a, 'e, 'es) t\n        (** The computation raises an unhandled exception. We'll \n            ignore exceptions for the purposes of this blog post. *)\n      | Operation :\n          ('o, 'e) op  * ('o, 'a, 'e, 'es) continuation\n          -&gt; ('a, 'e, 'es) t\n        (** This is returned when the computation calls \n            [E.perform operation]. The first argument is the operation \n            in question, and the second argument is a continuation object \n            that can be used to resume execution of the computation with \n            the result.\n        *)\n  end\nend\n\n\n\nSo when you use Effects you’re really thinking about two pieces:\n\n\n  The computation—this is your business logic. Computations have a type signature\nE.Handler.t @ local -&gt; 'a, where 'a is the return value of the overall computation.\n  The operation handlers—this is the code that interprets what the operations mean.\n\n\nWhen the computation calls E.perform, the execution jumps to the operations\nhandler. From the operations handler, you resume execution of the computation using the\ncontinuation by calling Handled_effect.continue k.\n\nWriting the computation is usually straightforward and not much different from writing\nregular code without Effects. Most of the complexity lies in writing the operation\nhandlers. Here’s what it looks like in this example:\n\nlet rec handle_computation_result (result : (_, _) E.Result.t) =\n  match result with\n  | Value result -&gt;\n    (* The computation has reached the end and returned the result *)\n    result\n  | Operation (op, k) -&gt;\n    (* If we're here, the effect has suspended. The [op] type is the set of\n       operations the computation can perform as expressed by our [Effect_ops]\n       type. [k] is a continuation that the user can use to resume the\n       computation execution with [Handled_effect.continue]\n    *)\n    (match op with\n     | Plus_one x -&gt;\n       handle_computation_result (Handled_effect.continue k (x + 1) [])\n     | Subtract_one x -&gt;\n       handle_computation_result (Handled_effect.continue k (x - 1) []))\n  | Exception exn -&gt;\n    (* In real examples, we would do smarter things with exceptions, but to keep\n       these examples easy to follow, we simply reraise them.\n     *)\n    raise exn\n;;\n\nlet run_computation (type a) (computation : E.Handler.t @ local -&gt; a) : a =\n  handle_computation_result (E.run computation)\n;;\n\n\n\nAlgebraic Effects for Hardcaml simulations\n\nThe Hardcaml_step_testbench\nlibrary is a library we use for\nFPGA simulations in Hardcaml. We have used this library for years to simulate FPGA\ncircuits. Recently, we added support for an effect-based API.  It’s a nice demonstration\nof what effects can get us in a domain that they’re not completely designed for.\n\nWe will walk through some example code that emulates the core behaviour of this library to\nshowcase the effectful API. The actual library is much more featureful, but the core idea\ncan be illustrated with a toy implementation.\n\nA digital circuit can be abstractly thought of as a stateful component that consumes\ninputs and produces outputs atomically at every time step. Hardware designers call this\ntime step a clock cycle, as physically it corresponds to a clock signal that is supplied\nto the circuit. An important distinction from software programs is that inputs and outputs\nare consumed at every clock cycle, whether or not the circuit is performing any higher-level\napplication functionality.\n\n(To the digital design engineers—I’m restricting this to synchronous single\nclock-domain circuits simulations. There are tricks we do to deal with multiple clock\ndomains, but it’s not important for what we’re talking about here.)\n\nCompiling these digital circuits into FPGAs can take on the order of hours (or closer to\nmonths for ASICs). We like getting faster feedback on the changes we make to our\ntestbenches and hardware, so we write simulations for these circuits.\n\nIn running digital circuit simulations, we are trying to synchronize two interacting\ncomponents:\n\n\n  The circuit itself\n  Testbench threads that synchronize and interact with the circuit\n\n\nThe execution flow of the testbench and simulated circuit looks something like this:\n\n\n\nIn practice, we oftentimes have multiple threads of execution in our circuit\ntestbenches. This is useful, as digital circuits can sometimes have complicated and\n(mostly) disjoint parts that different testbench threads might want to interact with\nindividually.\n\n\n\nFor the purpose of this blog post, we are going to make some simplifications:\n\n\n  When the user runs the computation, all the concurrent tasks must be known at that point\n  We require all computations to have no return values (i.e., they return unit)\n\n\nSo what makes this tricky in a pre-effects world? Imagine we want to interleave the\nfollowing two computations, with cycle () representing the synchronization points:\n\n[ (fun () -&gt;\n     for i = 0 to 2 do\n       cycle ();   (* Synchronization point *)\n       printf \"foo %d\\n\" i;\n     done)\n; (fun () -&gt;\n     for i = 0 to 2 do\n       cycle ();   (* Synchronization point *)\n       printf \"bar %d\\n\" i;\n     done);\n]\n\n\n\nIn the pre-Effects world, the only way we can do this is by using closures. We need\nclosures because we don’t have a practical way of representing the part of the execution\nthat comes after the call to cycle.\n\nThe most ergonomic way to do this happens to be with monads. You could imagine having a\ncomputation monad that looks something like this:\n\ntype 'a t =\n   | Return : 'a -&gt; 'a t\n   | Bind   : ('a t * ('a -&gt; 'b t)) -&gt; 'b t\n   | Cycle  : unit t\n   \nval cycle : unit -&gt; unit t\n\nval run_computations : (unit -&gt; unit t) list -&gt; unit\n\n\n\nThe run_computations bit of code is quite involved so I won’t include it here. (The fact\nthat it’s so complex is one motivation for using Effects.) The rough intuition is:\n\n\n  Start at the head of the computation. It will walk it up to a Bind(Cycle, f) point. It\nwill then set aside and store the f closure. If the computation evaluation ends up at\nReturn (), just mark it as done (there is no closure to store in this case).\n  Move on to the next computation, and do the same thing, until you get to the end of the\nlist.\n  Repeat the process until all the computation has been done.\n\n\nPutting that all together, and the monadic version of our testbench runner may look\nsomething like this:\n\nrun_computations \n  [ (fun () -&gt;\n      for_ 0 ~to_:2 ~do_:(fun i -&gt;\n        let%bind () = Step.step () in\n        printf \"foo %d\\n\" i;\n        Step.return ()))\n  ; (fun () -&gt;\n      for_ 0 ~to_:2 ~do_:(fun i -&gt;\n        let%bind () = Step.step () in\n        printf \"bar %d\\n\" i;\n        Step.return ()))\n  ]\n  [%expect\n    {|\n    foo 0\n    bar 0\n    foo 1\n    bar 1\n    foo 2\n    bar 2\n    |}]\n\n\n\nThis of course has all the problems of monads that we talked about earlier. But in a\npre-Effects world there isn’t another way to support the notion of a synchronization\npoint.\n\nWith OxCaml Effects, we do have a first-class way of representing the “computation to\ncome”. We can have an API that looks something like the following:\n\nmodule Handler : sig\n  type t\nend\n\nval run_computations : (Handler.t @ local -&gt; unit) list -&gt; unit\n\nval step : Handler.t @ local -&gt; unit\n\n\n\nThe same code above can be written in a much cleaner style:\n\nlet%expect_test \"\" =\n  run_computations\n    [ (fun h -&gt;\n        for i = 0 to 2 do\n          step h;\n          printf \"foo %d\\n\" i\n        done)\n    ; (fun h -&gt;\n        for i = 0 to 2 do\n          step h;\n          printf \"bar %d\\n\" i\n        done)\n    ];\n  [%expect\n    {|\n    foo 0\n    bar 0\n    foo 1\n    bar 1\n    foo 2\n    bar 2\n    |}]\n;;\n\n\n\nNote the interesting bit in this API: whenever a computation calls step, it yields\ncontrol to the “step runtime”, as a synchronization point, until all the threads reach\ntheir respective synchronization points. This is a very powerful feature that allows each\nof the testbench computations to interact independently with the circuit in the same\nstate, without having to coordinate their work explicitly. At the synchronization point,\nthe underlying circuit simulator will advance the circuit by one step, before resuming the\nexecution of the computations. As high-level programmers interacting with the\nHandled_effect library, we don’t need to understand how this all works under the hood.\n\nSo, how does one actually go about implementing the above code with effects?\n\nSimilar to before, we first define the effect operations. In this case, we need to perform\nan operation to yield control to the runtime, which we do with a Step operation. We\nalso define a step function that the users will interact with.\n\nmodule Effect_ops = struct\n  type 'a t = Step : unit t\nend\n\nmodule E = Effect.Make (Effect_ops)\nmodule Handler = E.Handler\n\n(* val step : Handler.t @ local -&gt; unit *)\nlet step (h : Handler.t @ local) = E.perform h Step\n\n\n\nThen, we define some kind of state tracker for the execution of a computation. We encapsulate\nthis in a Thread_state.t type. The trick here is that when the computation performs\nan effect, we don’t actually need to call the continuation immediately. We can have several\nconcurrent computations in flight, and several continuations.\n\nmodule Thread_state = struct\n  type t =\n    | Unstarted of (Handler.t @ local -&gt; unit)\n    | Running of (unit, unit, unit) E.Continuation.t Unique.Once.t\n      (** When the computation state is [Running], it means the computation called\n          [step] and has suspended its execution. It will synchronize with all other\n          computations at their respective calls to [step] before advancing.\n\n          The continuation object has the unique mode, which means the compiler verifies\n          that it can only be used exactly once. [Unique.Once.t] is used here\n          to cross the continuation into the aliased mode (it can be used multiple\n          times, which is the default in OCaml) to defer this check into runtime\n          rather than compile time. The exact details of how it works are not too\n          important for the purposes of this blog post.\n        *)\n    | Finished\n\n  let handle_result (result : (unit, unit) E.Result.t @ once unique) =\n    match result with\n    | Value () -&gt;\n      Finished\n    | Exception e -&gt;\n      Exn.reraise e \"Step raised exn\"\n    | Operation (op, k) -&gt;\n      (match op with\n       | Step -&gt; Running (Unique.Once.make k))\n  ;;\n\n  (* Advance the computation until it calls [step] *)\n  let run_until_step t =\n    match t with\n    | Unstarted computation -&gt;\n      handle_result (E.run computation)\n    | Running k_uniq -&gt;\n      handle_result (Handled_effect.continue (Unique.Once.get_exn k_uniq) () [])\n    | Finished -&gt; Finished\n  ;;\nend\n\n\n\nWith all these pieces together, we can implement the run_computations function.\n\nlet run_computations (computations : (E.Handler.t @ local -&gt; unit) list) =\n  let states =\n    Array.map (Array.of_list computations) ~f:(fun computation -&gt;\n      Thread_state.Unstarted computation)\n  in\n  while\n    Array.exists states ~f:(function\n      | Finished -&gt; false\n      | Unstarted | Running _ -&gt; true)\n  do\n    Array.map_inplace states ~f:Thread_state.run_until_step\n    (* In practice, we will advance time on the underlying circuit being simulated. *)\n  done\n;;\n\n\n\nThis is a simplification of what happens in practice, but not by much! The only primitive\nwe are missing here is spawn, which allows a computation to spawn further\ncomputations. This is something we can also achieve with Algebraic Effects.\n\nOxCaml’s type safety guarantees, or, Why does a handler have a local mode?\n\nThe paper on Locality and Effect Reflection goes\nthrough the formal argument for why our system uses a local mode for typesafe effects.\nHere I will attempt to give a brief intuitive explanation.\n\nRecall that we define a computation with an effect E as E.Handler.t @ local -&gt; 'a.\nWhen we call E.run on such a computation, it evaluates the function in an environment that can\nintercept calls to E.perform to yield control to the effect operations handler.  In\nother words, we can’t arbitrarily call E.perform outside the context of having\nregistered effect handlers for E.\n\nSuppose we could run computations of type E.Handler.t -&gt; 'a, without the local\nannotation on the handler. One can see how you could express ill-defined logic in this case:\n\nlet global_handler = E.run (fun h -&gt; h) in\nE.perform global_handler operation\n\n\n\nInside E.run, we are in the context with an appropriate effects operation handler.\nBut what happens when we call E.perform global_handler operation? E is no longer handled!\n\nThe @ local stops us from doing this. The above code snippet will fail with a compile\nerror due to a mode error.\n\n(* File \"fail_compilation_example.ml\", line 82, characters 39-40: This value is local but is\n   expected to be global.\n                                     |\n                                     |\n                                     | *)\nlet global_handler = E.run (fun h -&gt; h) in\nE.perform global_handler operation\n\n\n\nWrapping up\n\nI hope this post helps convince you that effects &gt; monads in many cases, and that it’s\nworth giving the Handled_effect library a\nlook.\n",
        "url"      : "https://blog.janestreet.com/fun-with-algebraic-effects-hardcaml/",
        "image"    : "https://blog.janestreet.com/fun-with-algebraic-effects-hardcaml/banner.png",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Getting from tested to battle-tested",
        "date"     : "December 3, 2025",
        "authorId" : "dpatti",
        "author"   : "Doug Patti",
        "tags"     : [],
        "minsToRead" : 18,
        "content"  : "Testing is an essential part of building reliable software. It’s a form of documentation,\na reminder of mistakes of the past, and a boost of confidence when you want to\nrefactor. But mostly, testing is a way of showing that your code is correct and\nresilient. Because it’s so important, we’ve invested a lot of effort at Jane Street to\ndevelop techniques that make tests clearer, more effective, and more pleasant to write.\n\nBut testing is still hard. It takes time to write good tests, and in any non-trivial\nsystem, your tests are an approximation at best. In the real world, programs are messy.\nThe conditions a program runs under are always changing – user behavior is unpredictable,\nthe network blips, a hardware failure causes a host to reboot. It’s inherently chaotic.\nAnd that’s the hard thing about developing high-availability systems: for all the careful\ntests that you think to write, there are some things you can only learn by experiencing\nthat chaos. That’s what it takes to go from merely being tested to being battle-tested.\n\nWe spend a considerable amount of time thinking about this problem in our development of\nan internal distributed system called Aria. Aria is a low-latency shared message bus with\nstrong ordering and reliability guarantees – you might recognize it from an episode of\nSignals and Threads where I talked about how it acts as a platform for other teams\nto build their own resilient systems with strict uptime requirements.\n\nMore and more teams have been adopting Aria at Jane Street, which is great! But it also\nmeans that each week that goes by without an incident becomes less of a tiny victory and\nmore of an obligation to keep the system running smoothly. Not to mention, the system has\nto continue to grow in scale and complexity to meet the needs of the teams that use\nit. How do we mitigate the risks that naturally come with change so that we can keep\nevolving the system? Testing goes a long way here, but it’s all too easy for your tests to\nmiss the critical scenario that will expose your mistake.\n\nEarlier this year we started using Antithesis, an end-to-end automated\ntesting platform, to fill those gaps. We’ve become huge fans of the service (and are now\nleading their next funding round! More on that later), and part of the point of this\npost is to explain why.\n\nBut before we get to that, let’s lay some groundwork for how Aria approaches testing.\n\nTesting everything you can think of\n\nWhile none of this is exactly novel, we’ve built up a rather extensive toolbox of\ndifferent testing techniques:\n\n\n  Unit tests of modules and data structures without side-effects, including many\nsimple state machines.\n  Integration tests with a simulated networking layer which allows for testing very\nfine-grained interactions between services, including delaying and dropping packets and\nmanipulating time.\n  Quickcheck tests that can\nproduce random orderings of events which we can feed into a simulation.\n  Version skew tests to ensure that new client library changes work with existing\nservers and older client libraries will be compatible with newer servers.\n  Fuzz tests using AFL which will turn the fuzzer’s byte\ninput stream into a sequence of state updates in an attempt to catch unsafe behavior in\nperformance-optimized state machines.\n  Lab tests to check for performance regressions which run nightly in a dedicated lab\nenvironment that is set up similar to production.\n  Chaos testing where our staging environment runs a newer version of the code while\nwe apply simulated production-like load and restart services randomly.\n\n\nEach one of these adds real value, but the simulated networking is maybe the most\nimportant piece. The ability to write tests which don’t require excess mocking and are\nalso fast and deterministic means that you can express more edge cases with less effort,\nget more introspection on the state of components, and run the entire suite in every build\nwithout worrying about flakiness. It is an invaluable tool when writing new features, as\nwell as a great way to write reproduction tests when verifying bug fixes.\n\nAria’s testing story requires a lot of effort and has evolved organically over time, but\nit also has been quite successful. Incidents in production are few and far between, even\nas we deploy new changes each week.\n\nWhen we do encounter a bug that slipped through, there’s always a sense of “oh, that’s a\nreally tricky case, it’s no wonder we didn’t think to test it”. Even our quickcheck and\nfuzz tests are limited to the confines of the artificial environments we construct for\nthem, and the chaos testing barely scratches the surface of what’s possible.\n\nTesting everything you didn’t think of\n\nLast year we had a chance to talk with the team at Antithesis and got really excited about\ntheir product.  The amazing thing that Antithesis does is run your whole system in a\nvirtual machine controlled by a completely deterministic hypervisor, and then adds a\nlittle manufactured chaos by interfering with scheduling and networking.  It uses this\nsetup to explore many different scenarios, and to discover circumstances where your system\nmight fail.\n\nPart of what’s great about this is that you don’t need to change your system to use\nAntithesis. You can run your system in a realistic environment – network, file system,\nshared memory, it’s all there. You get to interact with your system using real client\ncode. And if they do manage to make a process crash or cause an assertion to fail, you can\nreplay events to get back to that state and interact with the system as much\nas you want to understand what happened.\n\nWe weren’t sure how effective it was going to be, so we started with a trial period to\nfind out. Sure enough, on our first run, Antithesis surfaced two previously unknown bugs\n– notably, one had just been introduced a month prior, and seemed pretty likely to\neventually occur in production, and with fairly consequential effects. We’d actually\nthought about the possibility of this kind of failure when designing the change, but a\nsimple bug in the code slipped through, and we just forgot to write an explicit test.\n\nThere’s something really attractive about running your system in a way that looks and\nfeels like production. You can be a bit more confident that you’re not accidentally hiding\naway some race condition by rewiring everything to fit into a little box. I find the “API”\nof Antithesis to be quite elegant: provide some Docker images and a\ncompose file that describes the individual parts of your system, and they will\ncall docker compose up inside the VM. That gets the system into a running state,\nbut you obviously need to make it do something. So, you can create a directory in a\ncontainer full of executable files that each take some kind of action on your system –\nlike actions users or admins would take in production – and Antithesis will decide how\nand when to run them. And by and large, that’s it.\n\nOf course, the generality here is a double-edged sword: the space of all possible states\nand inputs is enormous. Even if you threw tons of hardware at the problem, you’d probably\nonly do a bit better than our chaos testing. That’s why the second half of Antithesis –\nthe exploration engine – is so important. One of the cool properties of determinism is\nnot just that you can reconstruct a state at any time, you can also reconstruct a prior\nstate too. So you can effectively rewind time and try a new approach. If the explorer is\ngetting feedback from which branches of code it managed to hit, it can know when it got\ninto an interesting or rare state, and it can spend more time taking different actions\naround that moment. Will Wilson, one of the co-founders of Antithesis, gave a talk\nwhich demonstrates some of the principles behind this search using the NES game Super\nMario Bros. as a test subject – it’s such a fun talk; I highly recommend checking it\nout.\n\nSo let’s say Antithesis stumbles upon a bug. What does that look like, and where do you go\nfrom there?\n\nA real bug\n\nWe kick off a test run each night with the most recent revision of code, and one morning\nwe came in to find results that showed an unexpected container shutdown. At first glance,\nthe logs included this.\n\n\n    \n    \n        \n        118.738\n        standby.replicator.1\n        Info: Streaming tcp receiver connected to 10.89.5.61:42679\n    \n        \n        118.738\n        standby.replicator.1\n        Error: Unhandled exception raised:\n    \n        \n        118.738\n        standby.replicator.1\n        (monitor.ml.Error\n    \n        \n        118.738\n        standby.replicator.1\n          (message_part.ml.Malformed_message\n    \n        \n        118.738\n        standby.replicator.1\n            (\"00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  \n    \n        \n        118.738\n        standby.replicator.1\n             \"00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  \n    \n        \n        118.738\n        standby.replicator.1\n             \"00000020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  \n    \n        \n        118.738\n        standby.replicator.1\n             \"00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  \n    \n\n\nThe “replicator” service connected to a server and shortly after, raised an exception and\ncrashed. The 118.738 is the time in seconds since the test started. The exception made\nit look like it was being served corrupt data, which should never happen under any\ncircumstances. Antithesis also has a tool that can investigate a specific instance of a\nfailure by rewinding a bit, running with different input, and seeing whether it failed\nagain. It produces a graph like this.\n\n\n\nThis is showing that somewhere about 6 seconds before the crash, something happened that\nput us in a state where it was very likely to reproduce. If we go back through the logs,\nwe can find out that Antithesis randomly killed a different service around that time.\n\n\n    \n    \n        \n        111.861\n        fault_injector\n        {\"fault\":{\"name\":\"kill\",\"affected_nodes\":[\"primary.tip-retransmitter.1\"]}}\n    \n\n\nWe can also filter the logs down to look for that specific service.\n\n\n    \n    \n        \n        113.911\n        primary.tip-retransmitter.1\n        Info: Starting from snapshot with stream time 2025-11-28 16:59:51.362900555-05:00\n    \n        \n        113.911\n        primary.tip-retransmitter.1\n        Debug: Streaming TCP retransmitter listening on 10.89.5.61:42679\n    \n\n\nAnd that also lists the same host and port that the replicator connected to. But this\nstill doesn’t say much – a server restarted, a client connected to it, and then the\nclient got corrupt data? At this point we can jump into Antithesis’ debugger environment,\nwhich lets you write notebook-style snippets to run inside the virtual machine. By\nrewinding time by one second before the crash and running tcpdump, we can capture the\nexact traffic that was exchanged between the client and the server.\n\nbranch = moment.rewind(Time.seconds(1)).branch()\ncontainer = 'standby.replicator.1'\nprint(bash`tcpdump -nn -X tcp and host 10.89.5.61`.run_in_background({ branch, container }))\nbranch.wait(Time.seconds(5))\n\n\n\nAnd with a little grit, we can extract the query that the client sent.\n\n\n    16:59:57.631701 IP 10.89.5.36.35922 &gt; 10.89.5.61.42679: Flags [P.], seq 40:67, ack 40, win 32768, options [nop,nop,TS val 2689576410 ecr 3733209032], length 27\n\t0x0000:  4500 004f c841 4000 4006 5355 0a59 0524  E..O.A@.@.SU.Y.$\n\t0x0010:  0a59 053d 8c52 a6b7 934f 4b7a df85 101d  .Y.=.R...OKz....\n\t0x0020:  8018 8000 1f54 0000 0101 080a a04f adda  .....T.......O..\n\t0x0030:  de84 3fc8 1900 1500 0000 51c8 0400 0000  ..?.......Q.....\n\t0x0040:  0000 0041 3131 3233 0000 00ff ffff ff    ...A1123.......\n\n\nThis highlighted portion is the byte offset that was requested by the client. It’s a\nlittle-endian 64-bit integer whose value is 0x04c851, or 313425 in decimal. Okay, so\nwhat did that snapshot contain?\n\ncontainer = 'primary.tip-retransmitter.1'\nprint(bash`aria admin get-latest-snapshot -max-stream-time '2025-11-28T16:59:51.362900555-05:00' \\\n            | sexp get '.snapshot.metadata.core_stream_length'`.run({ branch, container }))\n\n\n\nHere we not only get to use our own admin command to talk to a server, but we also can\nsimply pipe the output to another tool of ours that dissects and pretty-prints the output.\n\n\n    ((stream_time 2025-11-28T16:59:51.362900555-05:00)\n (byte_offset 315567))\n\nThis is telling us that the server started from byte offset 315567, which is after the\noffset of the request. It should have served the client an error, not bad data! At this\npoint we have enough of a picture to read through the code and figure out what’s wrong.\n\nThe gritty details\n\nThis bug was related to a new feature extending the “tip-retransmitter” service which was\nmentioned in the logs above. These services provide data to clients (the “replicator” in\nthis case) on demand from an in-memory ring buffer – only the most recent data in the\nstream, or the “tip”, is available. These services had been in use for a long time but\nrecently were given the ability to serve clients in other regions in addition to local\nclients. Something about this new behavior was buggy.\n\nAfter closer inspection, we realized that the implementation made some incorrect\nassumptions about the state of its ring buffer when checking if the client request was\nvalid. However, this only manifests\n\n\n  after the server was restarted and loaded a snapshot,\n  before the ring buffer was filled up, and\n  if the client sends a request for data before the snapshot.\n\n\nThis is exactly what Antithesis managed to reproduce. Instead of an error, the server\nincorrectly sent back NUL bytes from an empty region in the ring buffer. At the time the\noriginal code was written, snapshots didn’t exist, so the bug couldn’t have occurred. It\nwas only introduced later on.\n\nBut hold on a second, loading from snapshots had been around for a while, yet this only\nfailed once we extended it to serve other regions. Had it always been broken? Well, sort\nof. It turns out that local clients use a different method of service discovery which\nmeans they won’t even try to talk to a server which was started from a later snapshot\nbecause they knew it didn’t have the data. The clients in another region used a different\nmethod of service discovery and simply had to optimistically try.\n\nThis had all the ingredients for a tricky bug:\n\n\n  It required a niche situation where a server was restarted and a client connected to it\nafter it advertised and before it filled up its ring buffer, asking for data from before\nits snapshot.\n  It was code that had already been running in production for a long time, but the bug was\nbeing masked by the service discovery mechanism.\n  Because we were leveraging existing code, we didn’t think to write a new test,\nespecially for this situation.\n\n\nAnd the potential impact was really bad, since it involved serving corrupt data.\n\nHappily, Antithesis was just what we needed to catch the bug before it caused real problems.\n\nAntithesis found the bug shortly after the feature was completed and the new services\nadded to our Antithesis config. This time delay was short enough that we knew that\nsomething about our recent change was the culprit.\n\nIt also gave us the tools to actually dig in and figure out what happened. If this\nhappened in production, we would have gotten the exception, and we might have been able to\nnotice the log lines, but we wouldn’t have had enough data to narrow down the situation,\nand we wouldn’t have had a good way to verify the fix we wrote was fixing the actual bug.\n\nIt’s not that Antithesis replaces all of our existing testing. Each different flavor of\ntest really serves it’s own unique purpose. But the way in which Antithesis tests\nwhole-system scenarios that we either wouldn’t have thought to test is its own kind of\nmagic. Enough so that we’ve noticed a small cultural shift on the team where we feel like\nwe can tackle more ambitious projects by relying on Antithesis to fill in any gaps along\nthe way.\n\nWhere do we go from here?\n\nAntithesis has been really useful for Aria, and we’ve started working on applying it to\nother applications within Jane Street. We’re starting out with some similar,\nhigh-assurance distributed systems, like a new distributed object store that’s in\ndevelopment.\n\nBut we think there are lots of other opportunities for applying the tool. For one thing,\nwe’re excited about using Antithesis on systems whose testing story is less developed than\nAria’s. Not every system at Jane Street has gone to the trouble of using mockable network\nand timing services that let you build nice, deterministic simulation tests. Sometimes,\nthat kind of testing is simply infeasible, since some parts of the system rely on external\nsoftware that we don’t fully control. But that kind of software is still easy to run in\nAntithesis.\n\nWe also think that Antithesis holds a lot of promise in the context of agentic coding\ntools. One of the key problems with coding agents is that it’s hard to build confidence\nthat they’ve done the right thing. We think that Antithesis holds a lot of promise as a\nsource of feedback, both for using and for training such models.\n\nA future partnership\n\nThere’s one last part of this story to talk about: we were so impressed by the product and\nthe team behind it that we wanted to invest, and in the end, we’re leading their next\nround of funding. We love these kinds of partnerships because not only is this a\ntechnology that feels unique and aligned with our technical culture 1, but also because\nAntithesis has been so receptive to feedback, and is so passionate about what they’re\nbuilding.\n\nThis all lines up with Jane Street’s broader approach to private investing: we like to\nprovide long-term capital to companies where we understand the technology deeply and can\nsee the potential; where we like and believe in the people doing the work; and where\nthey’ve built something we’re excited to use ourselves as a customer. Antithesis hits all\nthose marks.\n\nOn a personal note, I’m really excited about this. The team at Antithesis is an absolute\npleasure to work with. I’ve never used a SaaS product where I got to talk directly to\ntheir engineers about bugs or specific behaviors, or to their designers about UX. And a\ncountless number of my colleagues have had to hear me gush about just how cool it is.\nI’m always strangely excited to see what it digs up next.\n\n\n\n\n  \n    \n      After all, we already abstracted away our entire network layer to get this\nhigh-fidelity integration testing &#8617;\n    \n  \n\n",
        "url"      : "https://blog.janestreet.com/getting-from-tested-to-battle-tested/",
        "image"    : "https://blog.janestreet.com/getting-from-tested-to-battle-tested/thumbnail.png",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Advent of FPGA — A Jane Street Challenge",
        "date"     : "November 24, 2025",
        "authorId" : ["asinghani","bdevlin"],
        "author"   : null,
        "tags"     : [],
        "minsToRead" : 2,
        "content"  : "Update: We got over 200 submissions to this challenge, spanning a wide variety of HDL\nlanguages and hardware platforms! We featured our favorite submissions in the results blog\npost, check it out here\n\n\n\n\n\nAdvent of Code has long been a favorite December\nritual at Jane Street, with many participating in the month-long puzzle challenge that\nencourages thoughtful engineering and out-of-the-box thinking – very much our kind of fun.\nLast year, Anish, a hardware engineer at Jane Street, wrote about tackling the entire\nseries in Hardcaml, our OCaml-based hardware DSL, turning these puzzles into synthesizable\nFPGA circuits. If you missed it, his post, Advent of\nHardcaml, walks you through how implementing such algorithms\nin hardware became a unique exercise in architectural design and resource optimization.\n\nThis year, we’re inviting the community to join us in that spirit with the 2025 Advent\nof FPGA Challenge. When the final AoC 2025 puzzle drops,\npick any puzzles you like (at least one and up to as many as you want) to build\nsynthesizable RTL with realistic I/O, bonus points if you do it in Hardcaml. We’re excited\nto see the clever designs created across the academic and open‑source communities, and\nwe’d also love to get more people trying Hardcaml!\n\nHere’s how it works:\n\n\n  Timeline: submit all solutions by January 16, 2026. (Submission Form)\n  What to submit: your code (open-sourced), testbench, and a README or document\nexplaining your approach and how to run it.\n  Synthesizable RTL: designs should be synthesizable with realistic resource usage,\nbut you are not required to synthesize or run it on an FPGA.\n  Original work only: no duplicates or obviously AI-generated submissions. You should\nbe able to explain your design.\n  RTL only: any RTL language (Verilog, VHDL, Chisel, Amaranth, Filament, and similar)\nis welcome and Hardcaml is, of course, encouraged. High Level Synthesis languages are\nnot supported.\n  Hardcaml resources are available here:\n    \n      Hardcaml template/demo project + setup instructions\n      Hardcaml tutorials + examples\n    \n  \n\n\nWe’ll pick the three most inventive & creative contributions (as voted by the Jane Street\nhardware team) to receive an FPGA dev kit (the Zynq UltraScale+ Kria\nKV260)\nto try your designs at home, as well as some Jane Street swag. Anyone who successfully\ncompletes at least one puzzle in Hardcaml will also win a Jane Street Hardcaml T-shirt.\n\nNot sure where to start? Consider exploring:\n\n\n  Scalability: can your design handle inputs 10× or even 1000× larger?\n  Efficiency: push area/performance trade-offs in the hardware.\n  Architecture: exploit FPGA-native parallelism and pipelining you can’t do on a CPU.\n  Physical synthesis: try an open-source ASIC flow (e.g., TinyTapeout).\n  Language features: showcase unique features of your favorite HDL language to generate\nelegant hardware beyond what you can do in Verilog/VHDL.\n\n\nWhen you’re ready with all of the puzzle solutions you’d like to be considered, you can\nsubmit your response here.\nThe winning solutions may be featured in a blog update!\n",
        "url"      : "https://blog.janestreet.com/advent-of-fpga-challenge-2025/",
        "image"    : "https://blog.janestreet.com/advent-of-fpga-challenge-2025/calendar.gif",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "What the interns have wrought, 2025 edition",
        "date"     : "August 27, 2025",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : ["internship"],
        "minsToRead" : 16,
        "content"  : "Yet again, we’re at the end of our internship season, and so it’s time to summarize what\nthe interns were up to!\n\nThis year, I was recommended a real bumper crop of exciting projects to include. It’s kind\nof crazy how many great intern projects are out there. To mention a few that I’m not\ngoing to have time to cover in detail:\n\n\n  Annie Hu spent a big chunk of her summer investigating, implementing, and optimizing\ndifferent neural net sequence models, trying out a variety of compilation techniques and\ntoolchains.\n  Aster Oransoy added build priorities to our build systems’ shared action execution\nservice, so you can ensure low-priority builds don’t slow down high-priority ones.\n  Allen Pei wrote a quickcheck-like\nsystem for creating automated tests of trading systems by generating randomized\nsequences of market events, along with shrinking heuristics for creating minimal test\ncases.\n  Evan Thompson wrote an LSP for our inline CSS syntax extension which includes a CSS\nvalidator that found tons of instances of invalid CSS in our applications.\n  Zhibo Chen added a generic form of optional arguments to OCaml, so that it can use other\ntypes than the traditional OCaml option type (including more efficient\nrepresentations) for optional values.\n  Conor Kennedy added predicate\npushdown to our internal data\nwarehouse system to do filtration before it gets to the full query engine, and even\nwrote a mini query planner for analyzing filter expressions to derive narrower key\nranges.\n  Joe Cutler worked on using JIT-ing to make our\nHardCaml simulator fast enough to be\ncompetitive with Verilator, but with much better\nstart-up times.\n\n\nAnd those are just the ones I felt like I could explain in a handful of words each!\n\nAs usual, I picked just three projects to go into in more detail. In particular:\n\n\n  Leo Gagnon wrote a (sometimes dramatically) more efficient evaluator for JSQL, our\ninternal SQL dialect that we use for lots of different user-facing tools.\n  Aryan Khatri built a new version of our OCaml torch\nbindings that leverage OxCaml’s new features for\ncontrolling memory management to build bindings that clean up tensors safely and\ndeterministically.\n  Anthony Li wrote a library for managing memory across processes within our trading\nsystems via ref-counting, making it possible to more efficiently and safely ship data\nacross the process boundary.\n\n\nLet’s dive in!\n\nFaster (J)SQL evaluation\n\nWe use a lot of SQL at Jane Street, both in standard Postgres (or similar) databases\nfloating around, and for accessing our own homegrown analytics-oriented data warehouse\nsoftware.\n\nOver time, we came to realize that SQL was sufficiently well-known internally that we\nwanted to use it beyond the context of databases, as a general language for filtering and\ntransforming tabular data.  This could be useful in all sorts of contexts: web UIs, data\nvisualization tools, trading-systems configuration tools, etc.\n\nThe problem with this idea is… which version of SQL should you use?  Every database you\nlook at has its own charmingly unique SQL dialect that’s almost but not quite the same as\nall the others.\n\nWe decided to deal with this by (I\nknow, I know) building our own dialect of SQL called JSQL.  We’ve\nbuilt a bunch of tools for using JSQL, including parsers, translators to other SQL\ndialects, web-UI components, and a collection of different in-memory evaluators for\ncomputing the results of a JSQL expression without invoking a traditional database at all.\n\nOur evaluators started out very simple, doing little more than walking though a collection\nof rows and one-by-one evaluating whether they passed or failed a WHERE clause. Over time,\nwe’ve built multiple evaluators with different performance properties, including\nincremental evaluators.\n\nThat said, none of our evaluators were all that sophisticated, and in particular, none of\nthem made use of indexing. Leo Gagnon’s project was to change that!\n\nThe idea was that when presented with data that’s in an indexed container, like a Map.t\nor Hashtbl.t, to be able to use that indexing to more efficiently filter down to the\ndata you need.  So, if you have a SELECT statement where the WHERE clause contains:\n\nauthor = \"Dijkstra\" AND publication_year &gt; 1980\n\n\n\nand the underlying data is contained in, say, a Paper.t list String.Map.t (a map from author names to \nlists of their papers), Leo’s evaluator would have to:\n\n\n  determine that we only care about things under the key \"Dijkstra\",\n  use an O(log n) Map.find to get the resulting Paper.t list,\n  use List.filter on the resulting much smaller list to select the papers with\npublication_year &gt; 1980\n\n\nWhich is way more efficient than walking over the entire map.\n\nGetting this done involved a bunch of steps!\n\n\n  \n    Building a selection type that represented the possible over-approximations of the\nrange of keys that would be needed to satisfy a given query.\n  \n  \n    Writing code to extract and optimize the selection for a given query.\n  \n  \n    Writing code to specialize the execution of the selection to the backing store for\nthe data.  For example, the selection type tracks when ranges of queries are in scope.\nThe Map.t type supports efficient range queries, but the Hashtbl.t type doesn’t, so\nyou need different execution strategies depending on which you use to store your data.\n  \n  \n    Supporting multi-index data-structures, like our Immutable_indexable_bag.  This\ninvolved building selection heuristics that help us pick the most efficient index to\nuse.\n  \n\n\nAnd, of course, benchmarking.\n\nThe results of that benchmarking were pretty promising.  We ran some sample queries over\n3.8 million rows of test data, comparing a linear scan over an array versus an\nindex-optimized scan over a Map.t.\n\nThis first query shows a ~700x speedup, since it lets us zoom in on just the MSFT trades,\nignoring everything else.\n\nSELECT * WHERE und = \"MSFT US\" AND event_date &gt; \"2025-01-01\"::date\n+----------------------------------+--------------------+\n| aggregator_name                  | average_time       |\n+----------------------------------+--------------------+\n| jsql-aggregations-eval           | 15.844514478s      |\n| jsql-indexed-aggregations-eval   | 21.939788ms        |\n+----------------------------------+--------------------+\n\n\n\nThis second query is more complicated, in that it requires us to do a scan over a range of\nvalues, but we still get a ~30x speedup here.\n\nSELECT * WHERE\n  (und = \"MSFT US\" OR (und &gt;= \"AAPL US\" AND und &lt; \"AMZN US\"))\n  AND event_date &gt; \"2025-01-01\"::date\n+--------------------------------+--------------------+\n| aggregator_name                | average_time       |\n+--------------------------------+--------------------+\n| jsql-aggregations-eval         | 37.056874003s      |\n| jsql-indexed-aggregations-eval | 1.324532585s       |\n+--------------------------------+--------------------+\n\n\n\nDespite this being a pretty algorithmic and performance-oriented project, a lot of the\nchallenges turned out to be about API design, and getting all of this work done with a\ncodebase that was simple and readable, and presented a convenient API to users.\n\nBetter Torch bindings\n\nWe use PyTorch a lot as part of our machine learning efforts, and as you might expect,\nmost of that work is done in Python. But sometimes, we want to drive PyTorch from OCaml,\nwhich we do using ocamltorch, originally written by\nLaurent Mazare some years back.\n\nBut OCaml is in some ways an awkward match for PyTorch, because OCaml manages memory using\na tracing GC, in contrast to Python, which uses a refcounting GC.\n\nA lot of ink\nhas\nbeen\nspilled on the\ntradeoffs between refcounting and tracing, but one clear difference is around the\ndeterminism of collection.  With a tracing GC, it’s hard to know when the memory you’ve\nallocated will be reclaimed. With refcounting, your object will be collected the moment\nyou drop your last reference to it.\n\nThis determinism comes in handy when you’re using your collector for managing things other\nthan main memory, like precious GPU memory.  This is a plot of GPU memory usage over time\ndoing one forward and backward pass on a batch, then some sampling, then 3 more batches,\nwritten naively with ocamltorch.\n\n\n\nThis behavior is pretty awful! We’re holding on to tensors we just don’t need anymore,\nwhich is basically intolerable.\n\nYou’d deal with this in ocamltorch by carefully calling Gc.full_major () after each\nbatch and each token sampled, to force the GC to recognize that the memory is unused and\nreclaim it. That gives you the desired memory behavior:\n\n\n\nbut it’s a poor solution, since the calls to the GC are expensive, and there’s no\ndiscipline to help you make sure you put them in the right place.\n\nAryan’s project was to build a better API for ocamltorch that provided a safe and\nefficient discipline for managing tensor memory, leveraging some of the new features of\nOxCaml, a set of extensions to OCaml that have been developed at Jane\nStreet.\n\nThe basic idea is to introduce a way of marking a scope of allocation for a tensor, using\nthis with_rc_scope function, where “rc” is short for “reference count”:\n\nval with_rc_scope : (unit -&gt; 'a) @ local -&gt; 'a\n\n\n\nThe idea is that the body of the closure passed to this function acts as a scope, and that\nany tensors allocated within it will have their refcounts decremented when the function\nends.\n\nTo make this all work, we use OxCaml’s local\nmode to make sure that tensors can’t\nescape their scope.  In particular, any function that allocates a tensor will allocate it\nas a local value:\n\nval ( + ) : t @ local -&gt; t @ local -&gt; t @ local\n\n\n\nThis prevents the allocated value from being returned from the closure passed to\nwith_rc_scope.\n\nHere’s a worked example of how you might use this in practice.\n\n    let vs = Var_store.create ~name:\"vs\" () in\n    let opt = Optimizer.sgd vs ~learning_rate:1e-3 in\n    let model = Model.init vs in\n    for index = 1 to 100 do\n      Tensor.with_rc_scope (fun () -&gt;\n        Optimizer.zero_grad opt;\n        let ys_ = Model.forward model xs in\n        let loss = Tensor.(mean (square (ys - ys_))) in\n        Tensor.backward loss;\n        Optimizer.step opt)\n    done;\n\n\n\nThe full API is a bit more complicated than just that.  The system has support for nested\nscopes, which is needed to support many of the idioms that are used in practice for both\ntraining and inference workflows on GPUs.  As part of that, there is some special support\nfor returning tensors from an inner scope to an outer scope in a controlled way that\ndoesn’t violate the reference counting rules.\n\nThe project itself involved a lot of experimentation at the API level, to design\nan API that was easy to use and understand and that also captured the memory-use patterns\nwe run into in practice. The project also had an interesting performance-engineering\naspect to it: removing all of the now-unnecessary GC invocations made it easier to understand\nand identify further inefficiencies (like unnecessary synchronizations between the CPU and\nGPU) that were harder to see amongst the performance mess created by the full_major\ninvocations.\n\nWe have more ideas about how to extend and improve these interfaces, but we already expect\nthe new APIs to be quite useful in their current form.  This is part of our open-source\ncode, so once the new code is released, you’ll be able to find it\nhere.\n\nRef-counted objects in shared memory\n\nAt Jane Street, we have lots of performance-sensitive trading systems that gather complex\ninformation over the course of their execution, and then periodically serialize pieces of\nthat data over a shared-memory channel to another process.\n\nThis is generally a pretty good approach, but it has its limitations.  Serialization\nalways has a cost, but here it’s made worse by the fact that the data we want to send is\ncomplex nested data with shared structure between messages. As a result, serializing the\ndata can involve serializing the same sub-structures over and over.\n\nAnthony Li’s project was to build a library supporting a very different – and much more\nefficient – approach.\n\nThe idea is to get rid of the serialization and deserialization altogether, and to just\npass pointers to the values in question instead. This requires that the space of objects\nin question is visible to both processes, so it means we need to allocate those objects\nwithin a shared memory segment.\n\nWe already have support for managing pools of objects in a shared memory segment, so this\nsounds easy enough at first glance.  But the tricky bit is figuring out when you can\nrecycle one of your pooled objects.\n\nWe can’t rely on OCaml’s ordinary GC for this because the data resides in a shared-memory\nsegment between two processes, each with their own GC.  And anyway, we don’t want to be\nchurning the garbage collector in a latency-sensitive trading system.\n\nInstead, Anthony’s project was to use a tried-and-true technique for this: reference\ncounting.\n\nSafer refcounting through modes\n\nReference counting is tricky to integrate into a language like OCaml that doesn’t have it\ndesigned in from the start.  There are really three invariants you need to get right for\nthis system to work:\n\n\n  There are no data-races on the refcounts or the objects themselves\n  Refcounts are incremented every time a new reference is created\n  Refcounts are decremented every time a reference is destroyed\n\n\nBut how do we ensure that these rules are followed when they’re not natively enforced by\nthe runtime?  This is a bit like the problem Aryan ran into with reference counting in\nPyTorch, and again, the solution is to leverage the system of modes to ensure the\nnecessary invariants.\n\nWe’ll need different modes at different times, so in order to manage this, we’re going to\nhave a special handle object o Handle.t that guards access to the underlying object (of\ntype o).  We can both use modes to protect the use of the handle itself, and the handle\ncan release the object o with specific modal types under specific circumstances.\n\nThat’s all a bit abstract, so let’s talk about the details:\n\nEliminating data races\n\nThere are really two data-race questions to handle here: one is about the refcounts, and\nthe other is about the actual objects being managed.  For the refcounts, an atomic\ncompare-and-set operation can be used to manage them in a safe way, so that’s pretty\nsimple, and doesn’t require anything from the mode system.\n\nThe mutability of the objects is more complicated, because the rules are different at\ndifferent times.  The objects must be mutable on initialization, since they have to be\nfilled in at that point.  But once you have multiple readers of the object, you really\nneed them to not change.  It turns out we can leverage OxCaml’s\nvisibility mode axis, which include\nimmutable, read, and read_write modes.\n\nSpecifically:\n\n\n  \n    During initialization, we expose the value under the read_write mode (which is the\ndefault), so the data in the object can be set.  Notably, at this point, we’re\nguaranteed there’s only one reference to the object in question.\n  \n  \n    When reading, we expose objects under the read mode.  This way, multiple readers (even\nacross processes) can access the same object without fear of a race.\n  \n\n\nNotably, once an object’s reference count goes back to zero, it can again be the subject\nof an initialization, so it can again be exposed read_write.\n\nAnother interesting aspect of this is that when we release the underlying values, we do so\nunder the local mode, to prevent the value from escaping its intended scope. As such,\nwhat we’re implementing is analogous to borrow-checking in Rust.\n\nManaging increments and decrements\n\nThe key invariant here is that people don’t just go about duplicating handles without\nincrementing the associated reference count.  To ensure this, each Handle.t is created\nunder the unique mode, and all operations that use handles require that they be provided\nuniquely.\n\nThis guarantees that all handles that are used are held uniquely, and so if you want to\nrefer to the handle in multiple places, an explicit copy function must be called.  And,\ncritically, that copy function increments the reference count.\n\nThere’s also a free operation that consumes a handle and decrements the reference count.\nAnd a way of sending a handle to another process, at which point the sending handle is\nconsumed, and a receiving handle is created, without changing the reference count.\n\nAnthony’s library is complete, and the team is now working it into our production systems.\nWe hope that this will be a library that’s useful to multiple teams across the firm.\n\nJoin us!\n\nIf this sounds like a fun way to spend your summer, you should apply to our internship\nprogram.  Jane Street interns\nget a chance to solve fun and challenging problems that have real impact.  I hope this\npost gives you a sense of that!\n",
        "url"      : "https://blog.janestreet.com/wrought-2025/",
        "image"    : "https://blog.janestreet.com/wrought-2025/curio-cabinet.png",
        "topic"    :  ["technology","internship"] ,
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "A Higgs-bugson in the Linux Kernel",
        "date"     : "July 2, 2025",
        "authorId" : "njha",
        "author"   : "Nikhil Jha",
        "tags"     : [],
        "minsToRead" : 14,
        "content"  : "We recently ran across a strange higgs-bugson that manifested itself in a critical system that stores and distributes the firm’s trading activity data, called Gord. (A higgs-bugson is a bug that is reported in practice but difficult to reproduce, named for the Higgs boson, a particle which was theorized in the 1960s but only found in 2013.) In this post I’ll walk you through the process I took to debug it. I tried to write down relevant details as they came up, so see if you can guess what the bug is while reading along.\n\nSome useful background information about NFS with Kerberos\n\nThe NFS (“Network File System”) protocol is designed to access a regular POSIX filesystem over the network. The default security story of NFSv3, which is what we’re using here, is roughly “no security” on an untrusted network: the server only checks whether or not the client is connected from a ”privileged” port number (i.e. less than 1024). If the client says it’s connecting on behalf of a particular user, the server just trusts the client. What could go wrong?\n\nThe other security option for NFS is Kerberos. When used with NFS, Kerberos cryptographically verifies the identity of the user accessing the file.\n\nWhat’s the bug?\n\n\n\nGord often does large file copies to ship data around. These copies would very rarely fail with -EACCES (Permission denied) despite the permissions being correctly set on the filesystem. Although retries were possible, it would be sad to lose progress copying these files. Also, strange errors in data storage are scary! It’s possible that spurious errors could indicate a larger issue.\n\nThere was no obvious pattern in these copies failing. Even identical jobs running simultaneously didn’t necessarily fail together. We did have one clue: if we switched Kerberos off in the dev environment (because the error sounded auth related), the copies never failed.\n\nSo, maybe something was wrong with the Kerberos credentials?\n\nHow does the kernel get your Kerberos credentials?\n\n\n\nIn a typical userspace program making use of Kerberos, libkrb5 will parse some environment variables or a config file to find the location of a Kerberos credentials cache. However, applications using NFS don’t need to link libkrb5 or otherwise know anything about Kerberos. They just do normal file I/O syscalls (open, read, write, etc) as if they were accessing a local filesystem. So what’s going on?\n\nIt turns out the kernel gets credentials via a root userspace daemon called rpc_gssd. When your application does its first I/O syscall to a file on NFS, the kernel writes to a file on a special mountpoint to communicate with rpc_gssd. (Fun fact: this mountpoint, rpc_pipefs, is an entirely separate filesystem implementation in the Linux kernel, just like ext4 or nfs itself.)\n\nMaking some simplifications, the rpc_gssd program grabs the user’s credentials, constructs the right Kerberos service ticket, and writes to the rpc_pipefs again with the result. This involves an API called GSSAPI (“Generic Security Services API”), which you’ll see mentioned throughout this post.\n\nLooking at the rpc_gssd logs around the time of the bug, I noticed that the kernel hadn’t requested credentials for a while. The most recently requested credential should also have been fresh for another few hours. So, this was a dead end.\n\nTrying to reproduce the bug\n\n\n\nI decided to try my luck by running a slow trickle of writes over the weekend. It seemed like the issue would be key-related somehow, so having a long-running process would force key expiry and plausibly reproduce the bug.\n\nChecking back in on Monday, none of the dozen boxes I ran this on failed. This wasn’t too surprising, because the issue was pretty rare in production.\n\nI then generated some large (~200GB) random files, put them on a test NFS mount, and started copying them to another test NFS mount in a loop on even more boxes. Once again, none of these copies failed.\n\nAt this point I was surprised I hadn’t seen the issue. To make sure I wasn’t just getting extremely unlucky, I decided to scale up the number of copies running in parallel.\n\nI was worried about wasting bandwidth and impacting other users, so I thought a bit about how to make the test a bit more lightweight. One easy win would be to copy from a local disk to NFS instead of from NFS to NFS. However, I had already requested boxes with tiny disks. Fortunately, I had a cool trick in mind.\n\nWriting a filesystem\n\nI decided to create a filesystem that contains large random files, but fits entirely in memory on small test machines. Here’s the idea: Instead of storing actual file content, I’d use a noise function (or hash function) to generate consistent random bytes on demand.\n\nThis turns out to be fairly straightforward. There’s a Rust crate called fuser that provides a nicely typed implementation of FUSE (“Filesystem in USErspace”). Around 20 minutes later (with some assistance from Claude), I had a fake filesystem that “contained” some large files to copy from.\n\nThis ultimately did take less time than it would have taken to use larger boxes, but felt slightly yak-shavy before I was sure this would work. I don’t think I would have attempted this trick if I didn’t have an AI assistant to write most of the filesystem for me.\n\nimpl Filesystem for RandomFs {\n    // ... removed some short functions that return metadata\n\n    fn read(\n        &mut self,\n        ino: u64,\n        offset: i64,\n        size: u32,\n        reply: ReplyData,\n        // .. removed some unused arguments\n    ) {\n        let data = self.generate_random_data(ino, offset, size);\n        reply.data(&data);\n    }\n}\n\n\n\nInserting arbitrary code into the Linux kernel at runtime\n\nThe next thing I wanted to do was collect some debug information. If the reproducer did work, I would want to see the kernel stack traces for the syscall that was returning EACCES. I needed to be prepared for this beforehand, because I expected the bug to take a while to show up.\n\nThere’s a Linux kernel subsystem called “eBPF”, which stands for “extended Berkeley Packet Filter”. As you might imagine, it’s supposed to let you filter network packets. However, it has since eaten the world and now lets you insert ~any code you want at the start or end of basically any function in the Linux kernel at runtime. This is fine. Everything’s going to be ok. Don’t worry about it!\n\nThere’s a handy tool called bpftrace that can quickly print arguments and return values of kernel functions (among other things). I wrote a bpftrace script that instrumented a few interesting-looking functions, something like this:\n\nfexit:auth_rpcgss:gss_cred_init {\n   // If it was going to return -13 (EACCES)...\n    if ((int64)retval == -13) {\n        // Print out the kernel call stack at this time (will tell us what called this function).\n        printf(\"gss_cred_init returned -13\\n===backtrace===\\n%s===end_backtrace===\\n\", kstack);\n    }\n}\n\n\n\nThe example above looks at the gss_cred_init function, and prints out the kernel stack trace if it returns an EACCES error. This is a very simple example, but definitely check out the bpftrace manual for other functionality.\n\nBack to reproducing the issue\n\nThe test setup was as follows:\n\n\n  Some jobs that run rsync processes to copy from the FUSE filesystem to a test NFS server.\n  A bpftrace script that watches for -EACCES being returned from relevant kernel functions.\n  A way to take a packet capture (PCAP) of just the time surrounding a returned -EACCES.\n\n\nAnd… It worked! -EACCES! Weirdly, a third of my test boxes failed at the same time? That never happened in production. Usually only one or two Gord jobs would fail at a time. One bpftrace message stood out: “gss_validate returned -13 (GSS_S_BAD_SIG)”.\n\nBad signature??? What? Of all the things that would make sense, this made sense the least. Was the server returning a bad signature? Was the client failing to verify it correctly? Was there memory corruption somewhere? Keep in mind all of this software is written in C, so almost anything is possible. Even nasal demons. If this was memory corruption, maybe I found a security vulnerability?\n\nI peeked at the packet capture of the bug in Wireshark and did not see any obvious signs of corruption. Other interesting things I noticed were:\n\n\n  There were a lot of retransmissions at the NFS level. The test NFS server I was using was small and probably got overloaded.\n  TCP frames were being split up and reassembled.\n  Again: A third of my jobs failed together, which was unexpected given what I saw in production.\n\n\nI didn’t have any good guesses based on the above. Maybe I could try to generate the signature myself to compare it with what’s in the packet? I knew Wireshark could decrypt Kerberos requests in network packets given the user’s Kerberos password, which was enough to grab the signature key (GSS token). All I needed to do was write a program to compute the signature given that token. Seems simple enough in theory, but how exactly do you do that?\n\nKerberized NFS packet format\n\n\n\nAn NFS request looks something like this. Some interesting things to call out here are:\n\n\n  There’s an XID, which matches responses to requests. A client can have multiple requests in flight, and the server can respond to them out of order, so an ID is necessary.\n  The credentials field specifies which GSS context the RPC request is associated with, and includes an incrementing sequence number (“GSS sequence number”). Note that this is a separate sequence number from the XID.\n\n\nIn the request, the checksum is the HMAC of roughly all the data in the request header, using the shared GSS key. In the response, the checksum is the HMAC of the GSS sequence number from the request.\n\nAn HMAC is a “Hash-based Message Authentication Code” – it allows someone with knowledge of the key to verify that someone else with the same key created the checksum.\n\nWriting a Wireshark plugin\n\nThe next thing I did was write a Wireshark plugin to compute the checksums of replies.\n\nWhile writing the Wireshark plugin I ran into a problem: there were retransmissions in my PCAP, so how do I figure out which of the retransmitted requests corresponds to a response? This was throwaway code for debugging, so I decided to make a big shared-mutable hashmap containing a map from XIDs to GSS sequence numbers. I updated the hashmap whenever Wireshark processed a frame containing an NFS request, assuming it would process them in order.\n\nThen, I loaded up my packet capture and browsed to the response with an XID that failed verification.\n\n&gt; Checksum Matched\n\nOkay. So the checksum in the packet is correct. Why did the kernel think it wasn’t? I clicked back to the request in the PCAP to take a look. Annoyingly, there were two requests with the same XID, meaning that a retransmission was involved. I then clicked back to the response.\n\n&gt; Checksum Mismatch\n\nHuh. Was my Wireshark plugin buggy?\n\n(At this point I think you should have all the information you need to guess what the bug is. It might be fun to think through this. When you’re ready, read on.)\n\nThe bug finally clicks\n\nRemember how I wasn’t sure which request to use to get the GSS sequence number from? It turns out the kernel has the exact same bug!\n\nSunRPC matches responses to requests via their XIDs, so if the server is overloaded and takes a while to respond, the NFS client might retransmit the request. The checksum field in the response is an HMAC over the request’s GSS sequence number. Note that this is not the XID, and is not included in the response. When the kernel retransmits a request with the same XID, it uses a new sequence number and updates the GSS sequence number it has recorded. If the kernel then receives the response that was associated with the old GSS sequence number, checksum validation fails. If this happens 3x in a row, -EACCES is returned to userspace.\n\n\n\nThis is almost self-fulfilling because each failure creates another retry. It is not guaranteed, however: you can still get lucky with timing and avoid the bug.\n\nBasically, the only reason I was able to reproduce the bug is because I was using a tiny test NFS server, causing latencies in the hundreds of seconds. If I had kept going with low-load testing, I probably would have had to use another method to find the bug.\n\nA quick read of some kernel source code confirmed that what I thought was happening could happen, but to be sure, I decided to write a lightweight reproducer that works by delaying packets.\n\nBecoming a human firewall\n\nThere’s a kernel facility called NFQUEUE which allows you to use a userspace process for packet filtering. This is probably intended for security use cases, but what I did was hook it up to a Python script where I can individually look at packets and press y to let them through after enough time has passed to trigger the bug. Basically, I could manually simulate high latency by being a very very slow human firewall.\n\nThen it was a matter of writing a little more glue code, and I had a fully automatic reproduction script.\n\nFixing the bug\n\nAt this point I reported my findings to my team, who quickly noticed that the RFC actually does mention this case.\n\n\n  “Then when it receives a response with a matching RPC transaction identifier, it can compute the checksum of each sequence number in the cache to try to match the checksum in the reply’s verifier.” - RFC2203 5.3.3.1. (Page 13)\n\n\nThe Linux kernel does not actually implement this cache as suggested by the RFC, so I wrote a kernel patch to add this functionality and mailed it off upstream. I also learned that the FreeBSD kernel actually already implements this, so this is new-to-Linux but not new-to-NFS.\n\nMore importantly, though, all that this cache does is increase the amount of retries needed to hit a bad interleaving. The fundamental problem is that a sequence number mismatch should not cause an immediate retransmission, which makes the problem self-fulfilling. So, I wrote a second kernel patch to not retransmit if a bad checksum is seen.\n\nThis feels principled, since a checksum mismatch suggests network tampering, so it makes sense to treat it as if we didn’t receive a message at all. The normal timeout logic can take care of retransmission in the unlikely case that one is needed. As final verification, I applied these patches and made sure that the test copy jobs and the Python reproducer no longer failed.\n\nBoth of these patches are now upstream and will be available in Linux 6.16.\n",
        "url"      : "https://blog.janestreet.com/a-higgs-bugson-in-the-linux-kernel/",
        "image"    : "https://blog.janestreet.com/a-higgs-bugson-in-the-linux-kernel/prod_vs_test.apng",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Introducing OxCaml",
        "date"     : "June 14, 2025",
        "authorId" : "lwhite",
        "author"   : "Leo White",
        "tags"     : [],
        "minsToRead" : 4,
        "content"  : "At Jane Street, we’ve been actively making improvements to OCaml for a long time. Over the\nlast few years, we’ve started to build some fairly ambitious extensions to the\nlanguage. Our aim is to make OCaml a great language for performance engineering. This\nwork has always been open source, and our hope is to contribute these extensions to\nupstream OCaml, but we’re still iterating on their design as we gain experience using\nthem. As such, we think the time has come to make it easier for people to use our\nextensions in the outside world. That starts with giving our branch of the compiler the\nthree most important components of a modern programming language: a cool name, a cute logo\nand a snazzy website. So, without further ado, we are excited to announce OxCaml.\n\nWe’ve also built a new website, oxcaml.org, that includes\ninstructions for how to install OxCaml and some\ntutorials and documentation of the extensions to get\nyou started.\n\nA fast-moving set of extensions to the OCaml programming language\n\nOxCaml is both Jane Street’s production compiler, as well as a laboratory for\nexperiments focused towards making OCaml better for performance-oriented\nprogramming. Our hope is that these extensions can over time be contributed to upstream\nOCaml.\n\nDesign goals\n\nOxCaml’s primary design goals are:\n\n\n  \n    To provide safe, convenient, predictable control over performance-critical aspects of\nprogram behavior\n  \n  \n    but only where you need it,\n  \n  \n    and in OCaml!\n  \n\n\nWhat does this mean?\n\nOxCaml’s extensions are meant to make OCaml a great language for performance\nengineering. Performance engineering requires control, and we want that control to be:\n\n\n  \n    Safe. Safety is a critical feature for making programmers more productive, and for\nshipping correct code. Languages that are pervasively unsafe are too hard to use\ncorrectly.\n  \n  \n    Convenient. We want to provide control without bewildering programmers, or drowning\nthem in endless annotations. To achieve this, we aim to maintain OCaml’s excellent\ntype-inference, even as we add considerable expressiveness to the type-system.\n  \n  \n    Predictable. One of the great features of OCaml today is that it’s pretty easy to\nlook at OCaml code and understand how it’s going to perform. We want our extensions to\nmaintain and improve on that property, by making key performance details explicit at the\ntype-level.\n  \n\n\nBy “only where you need it”, we mean that OxCaml’s extensions should be\npay-as-you-go. While OxCaml aims to provide more power to optimize, you shouldn’t need to\nswallow extra complexity when you’re not using that power.\n\nBy “in OCaml”, we mean that all valid OCaml programs are also valid OxCaml programs. But\nour more profound goal is for OxCaml to feel like OCaml evolving into a better version of\nitself, rather than a new language. For that, OxCaml needs to honor OCaml’s basic design\nsensibility, and to preserve the safety, ease, and productivity that are hallmarks of the\nlanguage.\n\nOxCaml’s extensions\n\nOur extensions can be roughly organized into a few areas:\n\n\n  \n    Fearless concurrency: Writing correct concurrent programs is notoriously\ndifficult. OxCaml includes additions to the type system to statically rule out data\nraces.\n  \n  \n    Layouts: OxCaml lets programmers specify the way their data is laid out in\nmemory. It also provides native access to SIMD processor extensions.\n  \n  \n    Control over allocation: OxCaml gives programmers tools to control allocations,\nreducing GC pressure, making program behavior more cache efficient and deterministic.\n  \n  \n    Quality of life: OxCaml also contains some extensions that aren’t specifically about\nsystems programming, but which we’ve found helpful in our day-to-day work:\n    \n      Polymorphic parameters\n      Include functor\n      Labeled tuples\n      Immutable arrays\n    \n  \n\n\nOxCaml also builds on our Flambda2 middle-end, which improves on the performance of OCaml’s\noptimizer, and powers many of the extensions we need to make OxCaml’s new language\nfeatures work.\n\nUsing OxCaml\n\nOxCaml is open-source, and we’re excited to welcome experimental users, especially\nresearchers and tinkerers who can kick the tires and provide feedback on the system. We\nput the emphasis on experimental because OxCaml makes no promises of stability or\nbackwards compatibility for its extensions (though it does remain backwards compatible\nwith OCaml).\n\nOxCaml is intended to be easy to use, and to that end comes with modified versions of the\nstandard OCaml tool-set, including:\n\n\n  Package management, compatible with dune and opam\n  Editor integration via the LSP-server\n  Source code formatting\n\n\nJane Street has long open sourced a bunch of useful libraries and tools. These are now\nreleased in two forms: one for upstream OCaml, in which our extensions have been stripped,\nand one for OxCaml, where the extensions are fully leveraged.\n\nNot all extensions are erasable, so some libraries will be available only for\nOxCaml. We’ll export OCaml-compatible versions of these libraries when the necessary\nextensions are integrated upstream.\n",
        "url"      : "https://blog.janestreet.com/introducing-oxcaml/",
        "image"    : "https://blog.janestreet.com/introducing-oxcaml/hero-desktop.svg",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Advent of Hardcaml",
        "date"     : "March 22, 2025",
        "authorId" : "asinghani",
        "author"   : "Anish Singhani",
        "tags"     : [],
        "minsToRead" : 12,
        "content"  : "Update: For the 2025 Advent of Code, we ran an Advent of FPGA Challenge where we\ninvited the community to implement their own synthesizable solutions to this year’s\npuzzles! We got over 200 submissions spanning a wide variety of HDL languages and hardware\nplatforms, check out our favorite solutions in the results blog\npost\n\n\n\n\n\nAdvent of Code is an annual Advent calendar\nfeaturing small programming puzzles created by Eric Wastl, which has been running\nevery December since 2015. Being the puzzle-lovers we are, a bunch of us at\nJane Street participate in Advent of Code every year.\n\nThere’s a recurring tradition in the Advent of Code community known as “Upping\nthe Ante” which entails solving or visualizing the puzzles in some unorthodox\nway, whether by code-golfing, using esoteric programming languages, or Excel,\nSQL—even implementing solutions entirely inside of video games.\n\nFor 2024, I spent a few weekends trying to solve some of the puzzles entirely\non an FPGA. It’s an interesting challenge adapting classic algorithms like\ngraph traversal, sorting, dynamic programming, and recursion to the unique\ncapabilities of the FPGA. Choosing to target a relatively small FPGA part (84K\nLUTs, 4 Mbits of RAM) also requires some algorithmic optimizations which would\nnot be a consideration if running on a regular computer.\n\nWhy use Hardcaml for this project?\n\nI decided to implement this project in Hardcaml,\nwhich is an open-source Hardware Description Language embedded inside of OCaml that we\nmaintain at Jane Street. Hardcaml includes a simulation backend as well as support for\ncompiling down to RTL, so the entire project, from designing the hardware to validating\nthe design to synthesizing it for an FPGA, is done in OCaml.\n\nTo get the advantages of implementing software algorithms in hardware, you usually have to\n“unroll” iterative or sequential operations into a hardware pipeline that can compute\nmany things in parallel. Traditional HDLs offer only very primitive generation features,\nso this unrolling becomes unwieldy, and tends to result in code that’s hard to read and \nto reuse across projects.\n\nHardcaml’s strong type system and expressive metaprogramming capabilities allow us to\nimplement highly flexible circuits concisely using OCaml. This also eliminates the risk of\ntype-confusion bugs (in hardware, everything’s just a wire, but we can assign semantic\nmeanings to those wires using OCaml’s type system), and makes the resultant code easier to\nunderstand and adapt. The ability to parametrize components using functors and\nhigher-order functions lets us easily reuse them across designs and projects.\n\nAll of this also makes the development experience a lot more efficient and\nenjoyable, which are particularly imporant traits when working on a\nside-project.\n\nTargeting a new FPGA platform\n\nIn keeping with the open source theme, I decided to target the Lattice ECP5\nFPGA chip, using the open-source Yosys +\nNextPNR toolchain. The ECP5 provides a\nsufficient variety of resources for such a project, while still being small\nenough to make resource utilization an important consideration.\n\nI defined an extensible OCaml\ninterface,\nwhich would then be implemented by each puzzle’s solution. Along with a NextPNR build\nscript and a top-level board wrapper (to set up the clock, UART and other peripherals), it\nmade it very easy to add a new design to the project; I could write the top-level of each\npuzzle solution as a module that implemented the Ulx3s.Design module type, and then add\nit to the list defined in\nbuild.ml.\nI set up something similar for parsing the puzzle inputs as well, which meant that all of\nthe parsing and setup code could be shared between the simulation tests as well as for\ninterfacing with the actual FPGA board.\n\nDay 4: Sliding window with higher-order functions\n\nDay 4’s puzzle entails taking an ASCII\nword-search grid, and searching for occurrences of the string “XMAS” in any\norientation: horizontal, vertical, diagonal, or reversed. This lends itself\nnicely to a “convolution” approach, where we use a shift register to store the\nmost recent 4 rows of the grid, and compute an appropriately-sized sliding\nwindow on each cycle.\n\nOne tricky part is that we actually need several different-sized sliding\nwindows (1x4, 4x1, 4x4) to match horizontal, vertical, and diagonal strings\nrespectively. This is further complicated by the fact that part 2 requires\nsearching for the string “MAS” in an X shape, while ignoring the squares that\ndon’t overlap with the X.\n\nHardcaml makes this quite nice to implement, as we can implement a higher-order\nfunction to construct such a sliding window for any given dimensions, and then\napply a provided function to check the sliding window against some condition\nand output a result. We can take advantage of OCaml types to represent the\nvariable-dimension window as a nested array at build time, but still flatten it\ndown to a fixed-width signal when generating the actual RTL.\n\n(* For each possible orientation (horizontal, vertical, and both diagonals,\n   build a sliding window to check for the XMAS. *)\nlet part1_count =\n  let pattern = List.map [ 'X'; 'M'; 'A'; 'S' ] ~f:char_to_xmas in\n  [ (* Horizontal *)\n    make_sliding_window ~width:4 ~height:1 ~check_fn:(fun window -&gt;\n      window |&gt; window_to_row |&gt; list_match_reversible ~pattern)\n    (* Vertical *)\n  ; make_sliding_window ~width:1 ~height:4 ~check_fn:(fun window -&gt;\n      window |&gt; window_to_col |&gt; list_match_reversible ~pattern)\n    (* Forward diagonal *)\n  ; make_sliding_window ~width:4 ~height:4 ~check_fn:(fun window -&gt;\n      window |&gt; window_to_diag |&gt; list_match_reversible ~pattern)\n    (* Reverse diagonal *)\n  ; make_sliding_window ~width:4 ~height:4 ~check_fn:(fun window -&gt;\n      window |&gt; window_to_rev_diag |&gt; list_match_reversible ~pattern)\n  ]\n  |&gt; reduce ~f:Uop.( +: )\nin\n\n(* There's several possible orientations in which the X-shaped MAS can match,\n   check all of them. *)\nlet part2_count =\n  let pattern = List.map [ 'S' ; 'M' ] ~f:char_to_xmas in\n  make_sliding_window ~width:3 ~height:3 ~check_fn:(fun window -&gt;\n    match window with\n    | [ [ c0; _; c1 ]; [ _; center; _ ]; [ c2; _; c3 ] ] -&gt;\n      let center_match = \n        center ==:. char_to_xmas 'A' in\n      let pair1_match = \n        list_match_reversible ~pattern [ c0; c3 ] in\n      let pair1_match =\n        list_match_reversible ~pattern [ c1; c2 ] in\n      center_match &: pair1_match &: pair2_match\n    | ... )\nin\n\n\n\n\n\nDay 7: Superpipelined brute-force search\n\nDay 7’s puzzle involves filling\noperators into a provided sequence of numbers, to try to make an arithmetic\nexpression that evaluates to some target number. This lends itself very nicely\nto a brute-force solution, iterating over every combination of operators and\nevaluating them to see if they match the goal. Fortunately, FPGAs are\nwell-suited for such tasks, and Hardcaml makes it very easy to construct an\nextremely deep pipeline. Each pipeline stage takes the current accumulator,\nevaluates it against the next operator/operand, and then passes down the\nremaining input to the subsequent stage. This lets us average one attempt per\nclock cycle, allowing us to test hundreds of millions of operator combinations\nin a matter of seconds.\n\nmodule Operator = struct\n  module Cases = struct\n    type t = Add | Mul | Cat | Nop\n    [@@deriving sexp_of, compare, enumerate]\n  end\n  module Enum = Hardcaml.Enum.Make_enums (Cases)\n  include Enum.Binary\nend\n\n(* Each stage of the pipeline contains the current accumulator, the goal value,\n   and the remaining operands and operators *)\nmodule Pipeline_state = struct\n  type 'a t =\n    { valid : 'a\n    ; accum : 'a [@bits accum_bits]\n    ; goal : 'a [@bits accum_bits]\n    ; operands : 'a Operand.t list [@length max_seq_len - 1]\n    ; operators : 'a Operator.t list [@length max_seq_len - 1]\n    }\n  [@@deriving hardcaml ~rtlmangle:\"$\"]\nend\n\n\n\nEngaging with the academic community\n\nWe love getting involved with the academic community and seeing all of the cutting-edge\nresearch happening in the space. Last year, a few of us presented Hardcaml and our work on\naccelerating MSM at the ISFPGA 2024\nconference.\nWe really enjoyed meeting everyone there, and attended again this year as a sponsor of the\nevent. We demoed a visualization of one of the Advent of FPGA puzzle solutions at our\ntable, which got quite a bit of interest and gave us a chance to talk further about why\nHardcaml is so fundamental to our development process at Jane Street.\n\nIt was great to connect with the research community and hear about all of the\nexciting work going on in the FPGA space. We’ll also be at IEEE FCCM, FOSSi\nLatch-Up, and FPL this\nyear. We’d love to chat with you, and you can even snag some limited-edition Hardcaml\nswag!\n\n\n\nHow can I use Hardcaml?\n\nIf this piqued your interest in playing with Hardcaml, check out the Hardcaml\nGitHub page for all of our open-source\nHardcaml libraries, tools, examples, and tutorials. Be sure to install the bleeding-edge\nopam repository to get all of our latest\nimprovements and features.\n\nAlso check out this interview with Andy Ray (creator of Hardcaml) on Jane Street’s podcast\nSignals & Threads, as well as the\n“OCaml All the Way Down”\ntech talk on how we use OCaml to build our FPGAs and supporting software from the ground up.\n\nCome work with us\n\nWe’re always looking to grow our team! If you’re excited about the idea of applying\nsoftware development techniques and tools to design hardware more efficiently, and are\ninterested in building FPGA accelerator systems with us at Jane Street, check out our\nexperienced FPGA\nengineer and new grad\nFPGA engineer\npositions. We’ll also soon be hiring interns for Summer 2026, see this\nlisting\nfor more details and to be notified when the application opens.\n\nIf you’re currently pursuing or are soon starting a PhD in computer engineering or a\nrelated field, check out our Graduate Research\nFellowship\nprogram as well!\n\nAn Open Challenge\n\nHardcaml isn’t the only Hardware Description Language of it’s kind. There’s\nmany other open-source HDLs out there, but it’s not always easy to compare the\nusability of different languages against each other, especially when\nconsidering them for larger-scale projects. I think these Advent of Code\npuzzles are a great way to experiment with new languages: they involve both\nclever hardware optimizations (unrolling, heavy pipelining), but also more\nmundane things like sequencing, input parsing, and resource sharing. The\nrelatively low complexity of the puzzles also makes it much easier to\nunderstand and compare implementations side-by-side.\n\nI’ve also really enjoyed seeing others’ attempts at similar projects, such as this\nimplementation of 2022 Day 1 in\nClash\n(a Haskell-based hardware development language), and this design for 2024 Day\n24\nimplemented in a 130nm silicon process, that the creator is actually working on getting\nmanufactured into a physical chip through TinyTapeout!\n\nSo I’d like to issue a challenge: If you have a favorite HDL language (or even\na High Level Synthesis framework) that you’d like to show off the capabilities\nof, try using it to solve some puzzles this December when Advent of Code 2025\nrolls around, and share your results with the community!\n",
        "url"      : "https://blog.janestreet.com/advent-of-hardcaml-2024/",
        "image"    : "https://blog.janestreet.com/advent-of-hardcaml-2024/snowglobe_camel.png",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "How we accidentally built a better build system for OCaml",
        "date"     : "January 24, 2025",
        "authorId" : "amokhov",
        "author"   : "Andrey Mokhov",
        "tags"     : [],
        "minsToRead" : 6,
        "content"  : "A “build system” is one of the most important tools in a developer’s\ntoolbox. Roughly, it figures out how to create runnable programs from\na bunch of different source files by calling out to the compiler,\nsetting up and executing test suites, and so on. Because you interact\nwith it daily, above all it has to be fast –\nbut it also has to be flexible.\n\nAround 2012 we were growing dissatisfied with OMake, then one of\nOCaml’s standard build systems, and decided to build our own; we\ncalled this new system Jenga. It worked quite well for us, and we\nthought the broader community might find it useful. So we decided to\nrelease Jenga. We hoped that when other people tried it, they’d like\nit, and maybe even contribute back to it. Releasing it would also make\nit easier for us to open source our code.\n\nHa! What actually happened is that nobody really wanted to use Jenga.\nFor one thing it didn’t work on Windows. But also, to adopt Jenga was\nin effect to adopt the whole “Jane Street way” of building OCaml. The\nadoption of Jenga by those we hoped to embrace it was weak enough that\nwe actually decided to un-open source it. And so we were back to the\nsame place as before.\n\nBy 2016 we had had enough of this, and decided to make a simple\ncross-platform tool, called Jbuilder, that would allow external users\nto build our code without having to adopt Jenga in full, and would\nrelease us from the obligation of rewriting our builds in\nOCamlbuild, then an emerging\nstandard for building OCaml projects.\n\nJbuilder understood the jbuild files that Jenga used for build\nconfiguration and simply executed all required compilation commands in\na topological order. It wasn’t a build system in the usual sense: it\nwould simply re-execute all commands every time (instead of only\nre-executing commands whose inputs have changed).\n\nJbuilder gets popular and becomes “Dune”\n\nThen something strange happened. People loved Jbuilder. They started\nusing it to build not just our packages but their own, too. At first\nwe didn’t really understand this. Jbuilder wasn’t a real build system,\nafter all. It was just meant to be a little compatibility shim.\n\nWhat we realized, eventually, was that the compelling feature was\nspeed. It turned out that Jbuilder was really a lot faster than the\nother options, compiling our projects something like 5x faster than\nOCamlbuild. That, plus the system being portable and easy to hack on,\nwere the things that mattered for early adopters and contributors.\n\nSo, in collaboration with OCaml\nLabs\n(and today, Tarides), we started working on\nmaking Jbuilder into more of a real build system, adding more of the\nfeatures that would be required for it to be a useful tool for the\nbroader open-source world.\n\nAnd then, we ran into another problem. The name.\n\nIt turned out there was already a Borland Java IDE called\n“JBuilder.”  The system was\nlong defunct, and we even went to the trouble of finding the current\nowners of the copyright and asking them if they’d mind us using the\nname.  But, no dice.\n\nSo we decided to pick a new name.  We did a bit of community\noutreach, and “Dune”\nemerged as the winning name.\n\nIn the meantime, Dune’s popularity had exploded. People really began\nusing it in earnest, and we found ourselves in a somewhat ridiculous,\nself-inflicted situation: we now had two full build systems to\nmaintain and support.\n\nJenga vs. Dune\n\nIt became clear to us that Dune was the better system—a rethought\ndesign, faster for most people’s builds, with wider adoption, and a\nbetter API and user experience—which brought up the question “when\nwill we migrate Jane Street onto Dune?” Considering the provenance\nof the tool this felt like an absurd question, but oh well, that’s\nwhere we’d ended up.\n\nThe answer was inevitably “next year,” since the build systems team\nhad plenty to work on simply keeping up with our growing codebase. (In\n2016, when Dune started, we had 4M lines of OCaml code; today we have\n65M, plus 5M lines of Python.) A migration to Dune was daunting enough\nthat we never quite fully embarked on the mission. But it wasn’t so\ndaunting to keep us from estimating, aspirationally, that it might\nwell happen in the next six to twelve months.\n\nIt was only last year that we decided to finally rip the band-aid off.\nWe’d grown the build systems team to five full-time engineers, so we\nfelt that we finally had the strength to tackle this monster we’d been\nafraid of for so long.\n\nDune subsumes Jenga within Jane Street\n\nOne large chunk of work we didn’t really appreciate in the beginning was\njust making Dune scale to our huge codebase. Dune was quite fast\nexternally—but that’s in part because most users built relatively tiny\nthings compared to Jane Street’s 70M-line repository.\n\nAnd Jenga hadn’t stood still, either. Over a decade, we had improved\nthe implementation to deal with the growth of our codebase. That\nproduced a system that was pretty well-optimized for scale, and\ncarefully tailored to our monorepo’s requirements.  Now, much of that\ngood optimization work had to be translated to Dune.\n\nThere were more mundane problems. The build system is called by a\nvariety of different workflows, notably, from three different\neditors—Vim, Emacs, and VSCode.  Each one sadly had its own custom\nintegrations with Jenga that had to be migrated to use Dune one by\none.\n\nBut after more than a year of focused work we’re finally done: our\ncodebase is now built by Dune. At the time of the switch, Dune’s\nperformance was across the board as good or better than Jenga’s, and\nmuch better in some cases. In particular, builds where most of the\nbuild work is already in the cache (which is a surprisingly common\ncase!) have gotten 2-3x faster.\n\nA lot of what we’ve done to improve Dune’s performance will get open\nsourced, and some already has. We’re keen not to end up with two build\nsystems again—a Jane Street fork of Dune and the external Dune—so\nwe’re putting a lot of thought and energy into upstreaming our changes\nwhere we can.\n\nDune is a very good foundation for doing new things. Some of that is\nbecause Dune’s codebase is simpler and easier to work with; some\nbecause we can just focus on one system. But, features like\ndistributed builds, shallow builds (also known as “builds without the\nbytes”),\nand cached loading of the build graph itself are all closer in reach\nthan ever.\n\nThe exciting thing going forward is that we have a single system and\nour velocity in improving it is going way up. The team has also grown\nto 12 full-time engineers, in New York, London and Singapore, which\nmeans we’re now working on Dune 24 hours a day.\n\nIt’s been a long and sometimes meandering path, and certainly hasn’t\nunfolded the way we might have planned.  But we think that the end\nresult is good for OCaml’s build-system story, both within and beyond\nJane Street’s walls.\n",
        "url"      : "https://blog.janestreet.com/how-we-accidentally-built-a-better-build-system-for-ocaml-index/",
        "image"    : "https://blog.janestreet.com/how-we-accidentally-built-a-better-build-system-for-ocaml-index/dune-jenga.png",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Developer education at Jane Street",
        "date"     : "October 4, 2024",
        "authorId" : "abauer",
        "author"   : "Aaron Bauer",
        "tags"     : [],
        "minsToRead" : 8,
        "content"  : "Like most places, Jane Street largely teaches developers through a kind of apprenticeship\nmodel. A team matching process tries to thoughtfully match new devs to a team that suits\nthem; and from there carefully chosen projects, one-on-one mentorship, code review, and\nclose collaboration with people “on the row” – teammates sitting near you – does most of\nthe rest.\n\nBut we also put a lot of effort into more formal, classroom-style teaching. People who\nwork here are curious and and we’ve found that bona fide classes do more than cultivate\nnew skills; they help spread knowledge, including the normally tacit knowledge about what\neverybody else works on, who the subject-matter experts are, and what hard lessons we’ve\nlearned in recent history as we’ve built new systems and maintained old ones.\n\nBootcamps and Teach-ins\n\nThere are two main sets of classes that developers take when they start at Jane Street:\nOCaml Bootcamp, which is a kind of OCaml-and-our-tools-101, and Dev Teach-ins, a\ncollection of deeper and more specialized classes.\n\nOCaml Bootcamp is an immersive introduction to the language that usually lasts one or two\nweeks. It starts with basic exercises to familiarize you with OCaml’s syntax and then\ngradually ramps up toward more open-ended challenges that have you building small\napplications using our foundational libraries and tools. Along the way there’s another,\nsecret purpose: OCaml Bootcamp is where we introduce you to the most important of our\ndeveloper tools, like the build system, testing\nframework, documentation system,\neditor integrations, and so on – it’s a ton of stuff, and easier to learn hands-on, in\nthe context of solving problems.\n\nBootcamp is generally done independently by new full-time hires, and in a classroom form\nfor interns.  There are lectures, TAs, some supplementary videos and reading, and students\nare also encouraged to read along in Real World\nOCaml, the definitive textbook on the language\nco-authored by our own Ron Minsky.\n\nTeach-ins don’t start until a little later in your tenure, through the middle of the first\nyear, after you’ve had some time to soak in actual project work. Among other things,\nthey’re designed to help you take a step back and get a broader view of what’s happening\naround the firm. It can just be too easy to put blinders on once you’re settled into a\nteam.\n\nAt the same time, teach-ins introduce students to a wide range of important tools,\ntechniques, and systems.  And not just in a one-off, “here’s a quick blog post about it”\nway, but with time to slow down and work through hands-on exercises.\n\nEach teach-in course lasts 2-3 full days in one of the dedicated Classroom spaces in our\noffices.  Developers\nare strongly encouraged to attend at least some of the courses, and managers make a point\nof making space for participants to step back from their ordinary work and really take the\ntime to learn something new.\n\n\n\nThe Dev Teach-in curriculum\n\nTo get a taste of it, here’s the current curriculum. It’s by no means a complete picture\nof everything we find interesting—the courses and their material are constantly\nevolving, and there’s also some flexibility in the exact courses each participant takes.\n\nTesting\n\nDone right, tests can be pleasant to write, easy to read and update, and fast and\ndeterministic to run. In a system where you’ve done the work to make this possible,\ntesting is actually fun, and you’ll find yourself doing more of it, which in the end will\nhelp you ship more reliable software, faster.\n\nThe goal of this teach-in is to show students some of the tools and approaches that Jane\nStreet has built up over the years to achieve these goals. It does this by showing what it\nlooks like to work in a system that’s built for testability from the ground up.\n\nBesides covering workhorse tools like expect\ntests and property-based tests\n(using a library called Quickcheck), the teach-in focuses on several tools for writing\ngood tests at larger scopes:\n\n\n  \n    Datafetcher, an internal library which automates testing processes (like config\ngeneration) that depend on external data sources that are not easy to simulate.\n  \n  \n    Async.Time_source, part of Jane Street’s open source Async library, as a tool for\nwriting testable systems with time-dependent behavior.\n  \n  \n    Netkit, an internal library for writing testable systems that use the network.\n  \n\n\nMarketdata\n\nMarketdata is a fundamental part of our business, and a shaping force in our technology,\ninfluencing everything from datacenter design to trader spreadsheets. The marketdata\nteach-in is designed to give people who aren’t marketdata devs a sense for what it’s\nlike to work with under the hood. The teach-in involves building a marketdata library from\nscratch.\n\nIt covers the following topics:\n\n\n  \n    how an order book works\n  \n  \n    the design of a typical exchange feed (including reliable multicast)\n  \n  \n    the performance constraints that influence the design of our\nmarketdata APIs\n  \n  \n    the challenges of feed normalization\n  \n  \n    common gotchas for marketdata clients\n  \n\n\nAdvanced Functional Programming\n\nThis teach-in, designed and run by our OCaml Language team, covers a few advanced\nfunctional programming techniques relevant to OCaml language features, library design, and\nrefactoring.\n\nStudents start out by seeing how smart use of type system features like parameterized\ntypes and GADTs can make APIs safer and more expressive. Practical exercises include\nAPI-directed debugging of a system for distributing symbology information to order\nengines.\n\nNext, students learn how to take advantage of basic algebraic structures to define a\nhighly general, pure data structure called a finger tree—so general in fact that various\nother general-purpose data structures fall out of different instantiations of the API.\n\nThe teach-in culminates with an exploration of techniques like continuation-passing style\nand defunctionalization. Students apply these to optimize a parallel job system, and then\nbring everything together in an exercise implementing automatic differentiation from\nscratch (and using said differentiation to power an image enhancement application).\n\nOCaml Performance\n\nThis teach-in focuses on what tools we have to determine where an application is spending\nits time, and (perhaps more importantly) why. Concretely, students:\n\n\n  \n    Learn about the basic tools available for measuring performance: time-stamp counters,\ninline benchmarks, perf, memtrace (an OCaml-specific tool for measuring allocations),\nand magic-trace\n  \n  \n    Learn the pitfalls of “microbenchmarks” and how to avoid them\n  \n  \n    Write simple programs which illustrate how performance is dictated by the architecture\nof the underlying machine—in particular by writing programs that explore the cache\nhierarchy and learning some low-level details of how the hardware itself operates\n  \n  \n    Understand how to measure performance in the average case, as well as in rare (but\ncritical) code sections\n  \n  \n    Learn some of the peculiarities of OCaml and its foreign function interface\n  \n\n\nDeveloping Web Applications with Bonsai\n\nAt Jane Street a lot of our UIs used to be old curses-style command-line tools, and we\nstill have plenty of those – they are fast, keyboard friendly, and super flexible – but\nincreasingly we build web UIs, too. We’ve made our own frontend framework, called Bonsai,\nthat was inspired by React and especially by Elm (though its design has diverged\nreasonably far from those by now).\n\n\n  \n    The exercises in this teach-in take you through some of the basics\nof the framework: the virtual DOM, effects, managing state,\nre-computing that state incrementally, and managing control flow in\na typical web application.\n  \n  \n    But the bulk of the work is building an example app, designed around a partially\nrendered table of live-updating marketdata.\n  \n  \n    Students leave not just with a completed app but with some experience using Chrome’s\ndebugging and performance tracing tools.\n  \n\n\nSystems Debugging\n\nThe goal of this course is for students to develop a foundation in how computer programs\ninteract with the operating system and the hardware below it, and to build an intuition\nfor where to look when things go sideways. It’s designed to level up students’ systems\nadministration skills. Students work through material covering:\n\n\n  \n    Advanced usage of the shell, including subshells, job control and\nsignals, pipelines and redirection, parameter expansion, quoting,\nand an AWK tutorial\n  \n  \n    The purpose and design of the Linux operating system, with exercises\non packaging programs and reverse engineering them with strace,\ncoredumpctl, and gdb\n  \n  \n    Deep dives into memory, including the kernel’s view of memory, and\nmemory virtualization; CPU, the scheduler, and interrupts; network\nmonitoring and debugging; and filesystems, disks, and caches\n  \n  \n    Debugging the behavior of the kernel itself\n  \n\n\nTaking education seriously\n\nOne takeaway from all of this is that education is of central importance here.  You can\nsee it from the physical space and the way we use it.  We’ve had dedicated classroom space\nin all of our offices for years, and it gets used constantly. It’s just part of the\nculture, and it’s treated as a first-class thing – we don’t think of these classes as a\ndiversion from “real work” but as important work in itself.\n\nThat’s part of the reason that we’ve started hiring dedicated dev\neducators who spend their time\nthinking about how to improve programs like Bootcamp and the teach-ins, develop entirely\nnew materials, and help the people who run these programs to become even better teachers.\n",
        "url"      : "https://blog.janestreet.com/developer-education-at-jane-street-index/",
        "image"    : "https://blog.janestreet.com/developer-education-at-jane-street-index/classroom.png",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "ICFP 2024",
        "date"     : "August 29, 2024",
        "authorId" : "nmatschke",
        "author"   : "Nailen Matschke",
        "tags"     : [],
        "minsToRead" : 11,
        "content"  : "It’s no secret that Jane Street is an active participant in the programming language\ncommunity, and we’re excited to be attending ICFP 2024, the\nInternational Conference on Functional Programming, in Milan next week! Most members of\nour OCaml Language team will be there, and as usual, we look forward to sharing our work\nwith the wider community. Please see below for a full list of papers and talks that Jane\nStreet folk are involved in.  Note that a lot of these are collaborations of one kind or\nanother with researchers outside of Jane Street.\n\n\n  Oxidizing OCaml with Modal Memory Managament\n  Arrows as applicatives in a monad\n  A Non-allocating Option\n  Labeled Tuples\n  Mixed Blocks: Storing More Fields Flat\n  Designing interrupts for ML and OCaml\n  Pattern-matching on mutable values: danger!\n  Rethinking the Value Restriction\n  Flambda2 Validator\n\n\nWe’re doing work on many different areas in OCaml: type-system features, improvements\nto code generation and register allocation, better inlining, etc.\n\nBut a big focus of our work in the last couple of years has been around extensions to the\ntype-system to give users more control over performance-relevant aspects of their program.\nThis includes Rust-like control over patterns of memory management for avoiding heap\nallocation and garbage collection and for enabling data-race free parallel\nprogramming.  This has led to\nus investing in extending OCaml’s type systems to support modal types, supporting modes\nlike local and unique, as\nwell as a kind system for allowing us to specify unboxed memory\nlayouts.\n\nUpstreaming our changes\n\nWe’re doing all of this on our own branch of the OCaml\ncompiler.  But we really don’t want\nthese features to remain just with us forever.  We’re modeling this on the work that was\ndone by OCaml Labs on Multicore\nOCaml. That project lived on its own fork for a\nperiod of years, but eventually was upstreamed, after a lot of consideration and review.\n\nIn the long run, we hope to do the same with our extensions. That’s going to take some\ntime, of course, as well as some convincing of the broader OCaml world that our changes\nare worth upstreaming.  Some of that work will happen by publishing papers, like the ones\nmentioned above.\n\nBut we don’t think papers are enough here: it’s also important to give people a chance to\nkick the tires on these new language features.  That’s harder than it sounds, despite the\nfact that our compiler is already open source.  That’s because a lot of these features\nonly really take flight when you have an ecosystem of libraries that support them.  We\nhave those libraries internally, but for compatibility reasons we end up erasing their use\nof our language extensions when we release them publicly.\n\nTo fix that, we’ve put together a “bleeding-edge” opam\nrepo that uses both\nour compiler and the\nun-bowdlerized version of our libraries,\nso you can experience these type-system features the same way we do.\n\nWe’ll also have laptops set up to use our branch at the conference, and would love to show\nyou these features in person and let you try them out.\n\nOur extensions at work\n\nBelow, I’ll highlight a few examples of how we’re using these new features in real code.\n\nA New Bonsai API\n\nBonsai is the frontend web framework we use\nto build the vast majority of web apps at Jane Street. It features a functional\nuser-facing API, similar in spirit to the the Elm\narchitecture, but with a different approach to\nmanaging state, and with more powerful tools for optimizing the incremental performance of\nthe view calculation.\n\nFundamental to this model is Bonsai’s “two-phase” approach:\n\n\n  A graph-building phase, where a DAG of computations is constructed.\n  A runtime phase, where data flows through the computation graph, driving the dynamic\nbehavior of the application.\n\n\nThe static nature of the graph is critical for providing a sane model of\nper-component state, and also makes it possible to share the computation of subgraphs\nacross multiple components. The phase distinction also enables certain kinds of\noptimizations to be performed before runtime.\n\nHowever, enforcing that applications have the correct structure isn’t trivial, and the\n“old” Bonsai API often proved a hurdle for newcomers. Though it was type-safe, its model\nof Computation.ts composed of other Computation.ts and Value.ts was complex, and it\nproved all too easy to accidentally construct a _ Computation.t Computation.t or a _\nComputation.t Value.t, both of which were always a bug, which often presented itself far\naway from where the mistake was made.\n\nThe solution? During the graph-building phase, Bonsai provides users with a witness value\nat mode local, which they are then required to provide to various functions. Here’s a\nsimplified example of the underlying pattern.\n\ntype phase1_value\ntype phase1_witness\ntype phase2_value\n\nval only_callable_in_phase1 : phase1_witness @ local -&gt; phase1_value\n\nval run_phase1 : (phase1_witness @ local -&gt; 'a) -&gt; 'a\nval run_phase2 : phase1_value -&gt; phase2_value\n\n\n\nThis leverages the compiler’s escape analysis for local values, with the\nphase1_witness serving as proof we are in the correct phase.\n\nFor a deeper dive into the theory behind Bonsai, consider checking out Leo White’s Arrows\nas applicatives in a\nmonad\nHOPE talk.\n\nUnboxed Types and Mixed Blocks\n\nA major push of the OCaml Language team this year has been our work on unboxed types.\nThe basic idea of unboxed types is to allow for new types with a different layout\nin memory than traditional OCaml data.  These layouts are part of a broader kind system we’ve\nadded to the language.\n\nWe use the kind “value” to describe the types of ordinary OCaml values. So, we might\nwrite an ordinary polymorphic array like this:\n\nmodule Array : sig\n  type ('a : value) t\nend = struct\n  type ('a : value) t = 'a array\nend\n\n\n\nBut there are a number of other kinds as well, such as immediate, the kind of immediate\n(non-pointer) values, or word, the kind of unboxed nativeint#s.  We can also express\nunboxed structures, like #(int64# * float#), an unboxed pair of unboxed numbers, which\nwould have kind bits64 & float64.\n\nBut this all gets tricky when you get to defining structures, like this one:\n\ntype mixed = { symbol : string; price : float#; size : int64# }\n\n\n\nHere, the type mixed has a mix of layouts within it, and this clashes with the current\nway OCaml represents memory. Effectively, OCaml objects must either contain entirely things\nthat match the traditional OCaml memory layout, or, objects can be opaque, where they can\nhave arbitrary data, but won’t be scanned by the GC and so can’t contain any pointers.\n\nIn order to support types like mixed, we needed to add a mixed block to OCaml’s memory\nrepresentation.  The design choices here are pretty tricky, and you can hear more about\nthe details in Nick Roberts’ talk about Mixed\nBlocks.\n\nPolymorphism, or Lack Thereof\n\nIt should be no surprise that Streeters love OCaml’s rich and lightweight type system. Its\npervasive support for polymorphism makes it much easier to write certain constructs than\nthe equivalent in other languages, while still providing compile-time safety. However, as\nwe add kinds and modes, it’s hard to make those available with the right level of\npolymorphism.  This is a thing we’re actively working on, but it’s going to take time to\nget right, and in the meantime, we have less polymorphism at the kind and mode level than\nwe really want.\n\nWithout such polymorphism, developers are often forced to write the same thing twice, or\ncome up with tortured solutions using existing tools. For example, consider the identity\nfunction:\n\nlet id : 'a. 'a -&gt; 'a = fun x -&gt; x\n\n\n\nThis, unfortunately, only works for values at mode global. If we wanted to implement\nit for, say, values and float64s, at either mode global or local, we’d need four\ndistinct functions:\n\nlet id_value_global : ('a : value). 'a @ global -&gt; 'a @ global = fun x -&gt; x\nlet id_value_local : ('a : value). 'a @ local -&gt; 'a @ local = fun x -&gt; x\nlet id_float64_global : ('a : float64). 'a @ global -&gt; 'a @ global = fun x -&gt; x\nlet id_float64_local : ('a : float64). 'a @ local -&gt; 'a @ local = fun x -&gt; x\n\n\n\nAs somewhat of a stopgap, we’ve implemented\nppx_template, which essentially adds\nC++-style “templates” (😱) to the language. Users can write some snippet of code once,\nspecify which kinds and/or modes for which to generate bindings, and the ppx stamps out a\nversion of said code (modulo name mangling) for each.\n\nWith the ppx, the above instead becomes:\n\nlet id : ('a : k). 'a @ m -&gt; 'a @ m = fun x -&gt; x\n[@@kind k = (value, float64)] [@@mode m = (global, local)]\n\n\n\nBeyond significantly improving its readability, we expect that this brings it closer to\nwhatever syntax we ultimately end up choosing for kind and mode polymorphism.\n\nOther Odds and Ends\n\nThis might seem like a lot, but it’s only a fraction of the work we’re doing every day!\nOne feature which has seen rapid adoption within Jane Street is Labeled\nTuples,\nwhich Ryan Tjoa implemented as an intern project.\n\nWe are also nearing completion of a zero-overhead option type, called or_null. The basic\nidea is to use the C-like representation of a nullable value being either a direct pointer\nto the object in question, or a null-pointer. This works fine, except that to do it\nsafely, you have to make sure to not nest one nullable object inside another.  In this\ndesign, we use the kind system to prevent this kind of nesting.\n\nJane Street and you\n\nWe look forward to seeing everyone at ICFP, and are looking forward to sharing more about\nwhat we’re working on, and hearing about everyone else’s work as well!\n\nIf this kind of stuff sounds fun, consider\napplying!.  We have lots of fun\nroles both directly on our OCaml Language team, and on language-oriented projects across\nthe firm. Take a look here if you\nwant to get a sense of the other kinds of PL-flavored projects we work on!\n\n",
        "url"      : "https://blog.janestreet.com/icfp-2024-index/",
        "image"    : "https://blog.janestreet.com/icfp-2024-index/ICFP-2024.png",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "What the interns have wrought, 2024 edition",
        "date"     : "August 26, 2024",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : ["internship"],
        "minsToRead" : 16,
        "content"  : "We’re once again at the end of our internship season, and it’s time do our annual review\nof what the interns achieved while they were here.\n\nJane Street is a big enough place and the internship is wide-ranging enough that it’s\nimpossible to really cover the full spread of work that interns do here. So, instead,\nwe’ve picked a few interesting projects to focus on, just to give you a sense of the\npossibilities.\n\nHere are the ones I’m going to discuss this year:\n\n\n  \n    Arya Maheshwari wrote a first version of Camels, a Polars-like\ndataframe library for OCaml.\n  \n  \n    Arvin Ding designed a variant of our bin-prot\nbinary-serialization protocol, which was aimed at achieving better writing speed at the\nexpense of compactness.\n  \n  \n    Alex Li worked on improving a time-travel debugger for Limshare, a system we’ve built\nfor sharing risk limits across multiple trading systems.\n  \n\n\nNow, let’s dive into the details!\n\nDataframes for OCaml\n\nTables are a pretty convenient way to organize, think about, and\nwork with data. They show up in all sorts of places: databases, spreadsheets, and a wide\nvariety of so-called dataframe libraries in languages like Python and R.\n\nIn the Python world, there’s the ubiquitous pandas library,\nas well as some more modern, higher-performance alternatives, like\nPolars.  Polars is written in Rust, and is useful to users of both\nlanguages. Many of Polars’ performance advantages come from its parallel\nexecution engine, which is built on top of Rust’s support for fearless\nconcurrency, i.e. Rust’s\nability to provide type-level protection against data-races.\n\nSometimes we want to use dataframes from OCaml, too. To that end, we’ve written\nOCaml bindings for Polars and used them in various applications. That’s been great, both\nbecause dataframes are a convenient programming idiom, and because it gives us an easy API\nfor accessing parallelism safely.\n\nBut the Polars experience hasn’t been perfect. We don’t have a good incremental build\nstory for Rust, which, in combination with Rust’s fairly slow build times, makes depending\non Polars a real drag to the development process. Also, we’ve found some limitations and\nbugs in Polars (and our bindings) that have been harder to address than we’d like.\n\nAs it happens, a lot of the performance-oriented language features that make Rust a good\nchoice for Polars are becoming available in OCaml, including our work on data-race free\nparallel programming in OCaml\nbased on our\nsystem\nof modes, and\nflat-and-narrow data representations based on\nunboxed\ntypes.\n\nSo, we decided to experiment with a pure OCaml dataframe library, both because we thought\nit would be easier to use in existing OCaml applications, and because we\nthought it would be a good testbed for exercising the new language features we’re\nworking on.\n\nArya Maheshwari’s task this summer was to build a first version of such a library,\ncalled Camels. The goal was to lay down the bones of the system in a way that would let\nus work out the basic structure and core APIs, before we dive into the\nperformance-sensitive parts of the system.\n\nWe wanted an API that was easy to use and expressive, while still being amenable to\nserious optimization. Arya did this by separating out the syntax, i.e. the underlying\nstructure of the computation, from the semantics, i.e. what the computation actually\ndoes. So, a function like this one:\n\nlet running_sumproduct ~value ~weight ~ordering =\n  let open Expr in\n  let product = float value *. float weight in\n  let sorted_product = sort_by product ~by:(float ordering) in\n  cumsum sorted_product\n\n\n\ndoesn’t actually execute the sumproduct computation. Instead, it produces an expression\nthat describes the computation to be run. This phase separation gives you an opportunity\nto compile expressions down to a more efficient form before executing them, which you\nmight do with code like this:\n\nlet execute_running_sumproduct df ~value ~weight ~ordering =\n  Query.select (Query.view df) ~cols:[ running_sumproduct ~value ~weight ~ordering ]\n  |&gt; Dataframe.compile_exn\n  |&gt; Dataframe.execute\n\n\n\nAnother interesting design challenge was how to handle broadcasting.\nBroadcasting allows you to promote a single scalar value to a column before using\nit in column-level operations. Many dataframe\nlibraries make broadcasting implicit, which is both convenient and sometimes very\nconfusing. We made the choice of having explicit broadcasting.  So, you might write\nsomething like this, which creates an expression that adds three to a column:\n\nlet add3 column =\n  let open Expr in\n  int column + broadcast (int' 3)\n\n\n\nIn order to catch improper broadcasts and type mismatches at compile-time, Arya added a\nsimple type-system for expressions. The expression type tracks both the underlying row\ntype and whether it represents a scalar or a full-length column.  For example, if we\nforget a broadcast:\n\nlet add3 column =\n  let open Expr in\n  int column + int' 3\n\n\n\nThe OCaml compiler will produce the following error at the expression int' 3:\n\nThis expression has type (int, Length.one) t\n       but an expression was expected of type (int, Length.input) t\n       Type Length.one is not compatible with type Length.input\n\n\n\nCamels isn’t finished yet. We’re interested in writing alternate backends that exploit\nboth SIMD parallelism and multicore OCaml thread-level parallelism, as well as\nexperimenting with expression fusion and query planning as part of the compilation\nstep. But Arya’s work this summer has already gotten it to a pretty great starting place!\n\nFaster (and fatter) binary serialization\n\nWe build a lot of latency-sensitive systems that need to respond quickly to incoming data,\nmostly in the form of marketdata.  The core workflow is pretty simple: read in the new\ndata, update your calculations accordingly, and maybe send out a packet in response.\n\nThat’s most of what they do, but it’s not all they do: our systems typically also need\nto log information on a transaction-by-transaction basis, for later analysis and\ndebugging.\n\nThis logging needs to be as lightweight as possible, so it doesn’t slow down the rest of\nthe program.  We typically do this by writing the information we care about in a compact\nbinary form to be picked up by another, less latency-sensitive process, which will take\nthat data and then do some final processing and formatting to store it in our logs.\n\nThis serialization is often done in a format called called\nBinprot, and we use code-gen syntax\nextensions to make it relatively painless to\ngenerate the serializer/deserializer code. But Binprot, while quite efficient, isn’t\nreally optimized for this use-case.  In particular, Binprot gives up on some speed on the\nwriting side in exchange for reducing the number of bytes required in the serialized\noutput.\n\nThe goal of Arvin Ding’s intern project, then, was to create a library to serialize OCaml\ndata types as fast as possible, sacrificing size along the way.\n\nThe central change to the design was to give up on Binprot’s variable length-encoding of\nintegers.  OCaml’s ints are 64 bits long (well…really 63\nbits, but we\ncan ignore that for now), which means that you’d normally need 8 bytes to represent them.\n\nBinprot doesn’t do that, and instead tries to take advantage of the fact that most ints\nare small, and therefore can fit in fewer bytes.  Here’s what the integer-encoding code\nlooks like:\n\nlet bin_write_int buf ~pos n =\n assert_pos pos;\n if n &gt;= 0\n then\n   if n &lt; 27(* can be stored in 7 bits *)\n   then all_bin_write_small_int buf pos n\n   else if n &lt; 215(* can be stored in 15 bits *)\n   then all_bin_write_int16 buf pos n\n   else if arch_sixtyfour && n &gt;= 231\n   then all_bin_write_int64 buf pos (Int64.of_int n)\n   else all_bin_write_int32 buf pos (Int32.of_int n)\n else …\n;;\n\n\n\nThis is a great trick, and does buy us a lot of size-reduction.  But it has a real cost,\nboth because of the computation that has to be done for each int to be written, but also\nbecause it means that the serialized data representation is pretty different from the\nin-memory representation of the same object.  That prevents you from doing bulk copying of\nthe bytes out, using memcpy, say, which is a very efficient routine that takes full\nadvantage of the ability of the CPU to copy lots of adjacent bytes in parallel.\n\nIf we give up on the variable-length encoding, then we can start using memcpy for some of\nthe serialization work. In particular, adjacent, non-pointer fields in a record can be\ncopied simply by issuing a single memcpy that covers the full range of those fields.\n\nThere were lots of technical challenges here, and it required Arvin to familiarize himself\nwith some tricky and low-level parts of our codebase, and of OCaml itself.\n\nRather than modify the existing ppx_bin_prot syntax extension, Arvin decided to use\ntyperep, a first-class representation of the type of an object, which is in turn\ngenerated by a different syntax\nextension. The serializer then works off\nof the typerep, which is a lot easier to program against than doing this at the syntactic\nlevel directly.\n\nThe project also required some diving into low-level C-bindings, to do some of the\nbit-twiddling that was required.  It also required a fairly detailed understanding of\nOCaml’s underlying memory representation of types. They relied a lot on this\npage from Real World OCaml!\n\nBut there were further challenges. Arvin had some difficult debugging work tracking down\nsome stray allocations. One key moment came when we realized that Obj.unsafe_ith_field\nallocates (🤯), but only when it is used on a record of only floats (because of floats’\nspecial memory\nrepresentation). So Arvin\nhad to write a custom C version of unsafe_ith_field.\n\nThe end result worked out really well. Performance benchmarks showed it doing better than\nBinprot in every case they looked at.  For small messages, it might be only 10-20% better.\nFor large messages with a lot of non-pointer data, it could be as much as 15 times better!\n\nAnd those improvements have translated to our production systems.  This new protocol has\nbeen rolled out for more than a month, and we’ve seen improvements in tail latencies from\n30-65% in real systems.\n\nWe’re really excited about the end result. We haven’t observed any crashes or incorrect\nserialization/deserialization attempts, which suggests that the many Quickcheck tests\nArvin wrote were not in vain. And the library is fairly generic, so we’re excited to adapt\nit to more use cases throughout the firm.\n\nA time-travel debugger\n\nA key part Jane Street’s trading systems is our risk controls.  These controls largely\nrest on a set of risk-checking systems that are responsible for enforcing a collection\nof risk rules, with each system managing a particular slice of our trading.\n\nEach risk-checking system has to be allocated some limits, upper bounds on how much risk\nthe trading associated with that system is allowed to take on.  These limits are precious,\nsince they bound the amount of trading we can do.  And historically, we allocated risk\nlimits statically, based on human-written configuration files.\n\nStatic allocation is relatively simple, but it doesn’t allow us to maximize the use of our\nlimits, since it requires us to predict in advance which systems will need them, and there\nare limits on how well that can be done, even in principle.\n\nThat’s why we built Limshare, a system for dynamically allocating risk limits while\nkeeping a bound on our overall risk exposure.  Limshare is built on a framework called\nAria,\nwhich implements a distributed state\nmachine. In an Aria app, you\nhave a single global log of updates which, when run one after the other, build up the\napplication’s state.  This provides a simple replication and persistence story, since any\nprocess can always reconstruct the state of the system by replaying the global log.\n\nOne nice property of this approach is that the update log is useful for debugging. When\nsomething goes wrong, it’s in theory possible to step over events in the log to\nreconstruct exactly how you got into a bad state.\n\nIn practice, though, we’d only ever built rudimentary tools for replaying Aria messages in\nLimshare. There was an interactive debugger that let you step through updates one by one,\nprinting them out, and as well as printing out some useful bits of the state of the\napplication.\n\nThe trouble was that it had no notion of snapshotting. It would just replay messages from\nthe beginning of the day until a target time t, and from then on you could step forward\none message at a time (or jump to another target time). But because you couldn’t go back,\nif you skipped past the interesting part of the stream you’d have to start the replayer\nover again from the beginning of the day.\n\nAlex Li’s project was to make the debugger much faster, by leveraging Aria’s system of\nsnapshots. A snapshot is effectively a summary of the state of the system, as of a\nparticular point in time.  Without snapshots, in order to spin up an application, you need\nto replay all updates from the beginning of time.  With snapshots, you can just start from\nthe latest snapshot, and replay updates going forward from there.\n\nAlex added logic to the debugger to take snapshots of the app state at fixed intervals. To\ngo backwards in time the debugger would find the latest snapshot before the target time,\nand locally replay messages to efficiently construct the state of the world N messages\nback. As part of this work he also added snapshotting logic to Limshare in the first\nplace.\n\nThis debugger has made it easy to reconstruct what happened in the wake of complex\nproduction incidents.  Here’s an example inspired by real events of how you could use the\ndebugger to investigate why a given order was rejected unexpectedly.\n\nFirst, we step forward until we see a reject happen.  Then, we step back in time by one\nstep, until just before the rejection decision was made.\n\n&gt; step-time 10m\n(Decision (Resize_pool 2) REJECTED)\nstop_condition: Breakpoint: Reached a rejected request decision.\ncurrent stream time: 2024-08-26 10:36:23.967645939\n&gt; back-messages 1\n(Request.Resize_pool (pool_id 140737488355330) (downcrash $9_241_233)\n (upcrash $1_391))\ncurrent stream time: 2024-08-26 10:36:23.967491204\n\n\n\nNow, we can print out the state to see what the limit usages were at the relevant scopes,\nas well as the allocation request that was rejected.\n\n&gt; print\nChecked out resources and limits\n┌────────────────────┬─────────────┬──────────────┬─────────────┬──────────────┐\n│ node id            │ resources ↓ │      limit ↓ │ resources ↑ │      limit ↑ │\n├────────────────────┼─────────────┼──────────────┼─────────────┼──────────────┤\n│ kumquat            │ U$6_453_178 │ U$10_000_000 │    U$34_748 │ U$10_000_000 │\n└────────────────────┴─────────────┴──────────────┴─────────────┴──────────────┘\n\npools\n┌─────────────────┬─────────────┬───────────────────────┬──────────────────────────┐\n│            pool │ risk system │        request bundle │                     size │\n├─────────────────┼─────────────┼───────────────────────┼──────────────────────────┤\n│ 140737488355330 │      nasdaq │ pts2, kumquat         │              ↓ $0 | ↑ $0 │\n│ 140737488355329 │        nyse │ pts1, kumquat         │ ↓ $6_453_178 | ↑ $34_748 │\n└─────────────────┴─────────────┴───────────────────────┴──────────────────────────┘\n\nUndecided requests\n┌───┬────────┬─────────────────┬─────────────────┬─────────────────────────┐\n│ # │   Kind │            Node │            Pool │            Desired Size │\n├───┼────────┼─────────────────┼─────────────────┼─────────────────────────┤\n│ 1 │ Resize │ kumquat         │ 140737488355330 │ ↓ $9_241_233 | ↑ $1_391 │\n└───┴────────┴─────────────────┴─────────────────┴─────────────────────────┘\n\n\n\nThe thing we can see in this case is that pts2’s request was rejected because pts1 had\na large limit reservation already in place. At that point, we do more to figure out why,\nlike jumping back in time by five minutes more to see how long the reservation was in\nplace.\n\n&gt; back-time 5m\n(\"Enforcer lease \" (has_lease_until (2024-08-26 10:31:28.713728082-04:00)))\nstop_condition: Time_limit\ncurrent stream time: 2024-08-26 10:31:23.713892045\n&gt; print\nChecked out resources and limits\n┌────────────────────┬─────────────┬──────────────┬─────────────┬──────────────┐\n│ node id            │ resources ↓ │      limit ↓ │ resources ↑ │      limit ↑ │\n├────────────────────┼─────────────┼──────────────┼─────────────┼──────────────┤\n│ kumquat            │ U$6_453_178 │ U$10_000_000 │    U$34_748 │ U$10_000_000 │\n└────────────────────┴─────────────┴──────────────┴─────────────┴──────────────┘\n\npools\n┌─────────────────┬─────────────┬───────────────────────┬──────────────────────────┐\n│            pool │ risk system │        request bundle │                     size │\n├─────────────────┼─────────────┼───────────────────────┼──────────────────────────┤\n│ 140737488355330 │      nasdaq │ pts2, kumquat         │              ↓ $0 | ↑ $0 │\n│ 140737488355329 │        nyse │ pts1, kumquat         │ ↓ $6_453_178 | ↑ $34_748 │\n└─────────────────┴─────────────┴───────────────────────┴──────────────────────────┘\n\n\n\nIt turns out, there was no good reason for a reservation of that size to be out for that\nlong, which made it clear that that reservation was because of a bug.\n\nThis example highlights what’s great about great observability tools: they make visible\nwhat was previously hidden, and thereby dramatically simplify the process of debugging. In\nthat way, it feels similar to some other tools we’ve built, like\nmagic-trace and\nmemtrace.\n\nAnother exciting thing about this debugger is that the concept is really very general.\nThere’s nothing about it that in principle limits it to Limshare, and indeed, the Aria\nteam has recently adopted the project, and plans to make it a broadly supported tool\nacross the Aria ecosystem.  There are already several other teams interested in adopting\nit for their Aria apps!\n\nJoin in on the fun!\n\nIf any of this sounds like fun, you should\napply to one of our\ninternships.  One of the great things about our intern program is you get a chance to work\non real and impactful projects that will really stretch your skills as an engineer, which\nI hope these examples do a good job of demonstrating.\n",
        "url"      : "https://blog.janestreet.com/what-the-interns-have-wrought-2024-edition-index/",
        "image"    : "https://blog.janestreet.com/what-the-interns-have-wrought-2024-edition-index/WTIHW-2024.png",
        "topic"    :  ["technology","internship"] ,
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Visualizing piecewise linear neural networks",
        "date"     : "July 22, 2024",
        "authorId" : "richeng",
        "author"   : "Ricson Cheng",
        "tags"     : [],
        "minsToRead" : 5,
        "content"  : "Neural networks are often thought of as opaque, black-box function approximators, but theoretical tools let us describe and visualize their behavior. In particular, let’s study piecewise-linearity, a property many neural networks share. This property has been studied before, but we’ll try to visualize it in more detail than has been previously done. \n\nPiecewise-linearity means that a function can be broken down into linear parts, even if the function as a whole isn’t linear.\n\nThe ReLU 1 activation, one of most commonly used activations, can be broken down into two linear sections which join at the origin.\n\n\n  \n\n\nA basic but widely used neural net architecture just interleaves linear layers with ReLU activations, so that’s what we’ll focus on. Here’s a single layer neural net, with two inputs and a single output neuron with ReLU activation. The two inputs are on the x and y axes, the output is on the vertical z axis.\n\n\n  \n\n\nThe ReLU is off in the left half, and on in the right half. \n\nImportantly, neural nets can only learn continuous piecewise-linear functions. A neural net wouldn’t be able to learn the function below, because the two pieces don’t line up at the boundary.\n\n\n  \n\n\nNow let’s increase the number of output neurons to 8, which gives us a few more divisons2. Each polygon we’ve formed corresponds to some subset of the ReLUs being on, and the rest being off (an activation pattern).   Naively, there should be 2^8 activation patterns, but because we’re constrained to a 2-d plane, only 37 (the 8th central polygonal number) of these are feasible, 32 of which are visible below. We call the whole arrangement of these lines and polygons3 a polyhedral complex. \n\n\n  \n\n\nHere’s a top-down, birds-eye view of the same figure (rotated roughly 90 degrees counter-clockwise)\n\n\n  \n\n\nLet’s add a second layer, also with 8 neurons and ReLU activation. The second layer lines are drawn thinner here, to distinguish them from the first layer lines.\n\n\n  \n\n\nAnd the 3d view\n\n\n  \n\n\nBecause the composition of two linear functions is linear, within each region carved out by the first layer, the second layer lines are straight. However, when a second layer line hits a boundary, the linear function changes, so the line kinks. It might also terminate if one of the activation patterns is infeasible: for example, if the first layer activation pattern says all the neurons are off, and if the bias is -1, then the only feasible activation pattern in the second layer is for all the neurons to be off as well. \n\nThe algorithm to compute this is pretty straightforward: we just test every activation pattern in every parent region and check if it’s feasible.\n\nprevious_layer_regions = [euclidean_plane]\nfor each layer:\n    regions = []\n    for parent_region in previous_layer_regions:\n        for active_neurons in powerset(layer.neurons):\n    \t\tregion.linear_function = compose(layer, parent_region.linear_function)\n            region = solve_constraints(\n                parent_region.linear_constraints,\n                region.linear_function[active_neurons] &gt; 0,\n                region.linear_function[~active_neurons] &lt;= 0,\n\t\t\t)\n            if region is feasible:\n                regions.append(region)\n    previous_layer_regions = regions\n\n\n\nBy the third layer, circular structures emerge and are further refined in the successive layers. Regions where the neural net’s output value is higher are given a brighter color than regions with low outputs. \n\n\n  \n  \n  \n  \n\n\nJumping back into 3d, here’s the final output of our neural net.\n\n\n  \n\n\nSo far we’ve only examined an single neural net trained to reproduce the Jane Street rings, but we can visualize how the whole polyhedral complex evolves as the weights of the neural net change. We start with an untrained neural net whose weights are randomly initialized, and interpolate towards neural nets trained to produce some recognizable shapes (at 0:20 and 0:42). Notice how the untrained weights divide the plane up into just a few polygons, while the trained weights tend to make many polygons. \n\n\n\n\n\n\n\n\n\n\n\n\n\n  \n    \n      while this is a ReLU, the neural networks visualized in this post actually use a LeakyReLU(0.02) activation, because small ReLU networks easily get stuck in local minima when training. &#8617;\n    \n    \n      the output of our neural net is now 8-dimensional, but we pick an arbitrary linear projection so that we can still visualize things in 3-d. &#8617;\n    \n    \n      technically, some of these regions are unbounded and therefore aren’t polygons. it would be more accurate to say “half-space intersections”. &#8617;\n    \n  \n\n",
        "url"      : "https://blog.janestreet.com/visualizing-piecewise-linear-neural-networks/",
        "image"    : "https://blog.janestreet.com/visualizing-piecewise-linear-neural-networks/6_1.png",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "What the interns have wrought, 2023 edition",
        "date"     : "September 12, 2023",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : ["internship"],
        "minsToRead" : 12,
        "content"  : "We’re once again at the end of our internship season, and it’s my task \nto provide a few highlights of what the dev interns accomplished while\nthey were here.\n\nThe program was big! We had 152 software engineering interns, drawn\nfrom 58 schools across 19 different countries.  And that’s not even\ncounting the 31 tech interns in areas like production engineering, IT\nengineering, network engineering, and more.\n\nThe intern program is so big and diverse that it’s impossible to\nfaithfully summarize it with just a few projects. But, something is\nbetter than nothing, so here are the projects I’ll discuss this time\naround:\n\n\n  \n    Rajeev Godse wrote a query language based on Linear Temporal Logic\nfor querying complex events out of system logs.\n  \n  \n    Semyon Savkin worked on our AI Assistants team, building an\nefficient tokenizer in OCaml.\n  \n  \n    Sasha Hydrie added concurrency support to our tracing syntax, making\nit suitable for use with our asynchronous programming frameworks.\n  \n\n\nNow let’s dive into the details!\n\nA temporal query language\n\nConcord (which we’ve\ntalked\nabout\nbefore) is a platform for building systems that let counterparties\nconnect to us to trade directly with or through us.\n\nConcord’s architecture is built around a single, totally-ordered\nstream of transactions.  One nice thing about this approach is that\nthe transaction stream is a great debugging aid: an incredibly\ndetailed source-of-truth that you can dive into when you’re trying to\nfigure out why the system is misbehaving.\n\nUnfortunately, the tooling we had for digging into this treasure trove\nwas a bit limited.  All we really had was the ability to find and grab\nindividual messages.  But sometimes you want more than that! You want\nto search for sequences of events that match some specified criteria.\n\nWe did have some tooling for this. In particular, one of our\nengineers had built a stream query system based on linear temporal\nlogic, or, LTL\nfor short.  LTL is a well-studied logic that takes basic propositional\nlogic and adds to it two key operators: next and until.\n\nRoughly, p next q means that predicate p holds currently, and that\npredicate q holds thereafter.  And p until q means that p holds\ncurrently, and will continue to hold until q starts holding.\n\nIf it’s not obvious to you how to use these two operators to build\nmeaningful queries, well, join the club.  It can be a bit of a puzzle\nto figure out how to convert meaningful queries into the fairly\nlow-level logical statements that LTL is built on.  To make matters\nworse, the only syntax we had for writing these queries was a fairly\nawkward s-expression based format.  As a result, almost no one used\nthe LTL query engine.\n\nThat’s where Rajeev’s project came in. Rajeev’s goal was to build an\neasier-to-use, SQL-like query language to act as a frontend to the\nLTL query engine.  The language wouldn’t be quite as expressive as\nLTL, but it would be a lot easier to use.\n\nWe don’t really have space to go into detail on how the language\nworks, but here’s an example of a query for retrieving and printing\nout retail orders paired with the first execution received by each\norder:\n\nFIND wholesaling.retail_order.new Retail_order\nTHEN FIRST wholesaling.route.fill.external Fill\n  WHERE .retail_order_id = Retail_order.retail_order_id;\nPRINT\n, Fill.time - Retail_order.time AS order_to_fill_time\n, Retail_order.retail_order_id AS retail_order_id\n, Retail_order.time AS arrival_time\n;\n\n\n\nHe built the parser for this new language on top of\nAngstrom, a\nparser-combinator library for OCaml.  It wasn’t too hard to get a\nworking parser; the biggest challenge was getting good error\nmessages.  But after some careful wrestling with the system, Rajeev\nwas able to get it to track enough context to generate good error\nmessages in the cases that mattered.\n\nIn addition to getting the basic system in place, Rajeev had time to\nadd a few interesting temporal operators to the language, including:\n\n\n  \n    LAST p BEFORE q, which matches messages M1 and M2 such that\nM2 satisfies q and M1 is the last message satisfying p\nbefore M2.\n  \n  \n    NO MESSAGE p BEFORE q, which matches M satisfying q such that\nno messages before M satisfy p.\n  \n\n\nAll in, the project was a real success. The new temporal query\nlanguage has become the go-to tool on the team for debugging\nperformance problems, and there have been requests from other teams to\ngeneralize the language so it can be used against other systems as\nwell. This feels like an exciting new part of our toolkit for\nsupporting production systems.\n\nEfficient token-counting in OCaml\n\nIf you’ve ever used an AI chatbot you’ll appreciate the importance of\nkeeping track of your token usage—both as a way to keep costs in\norder and to mind rate limits. Surfacing these token counts in real\ntime to users helps them understand and moderate their own usage.\n\nThe project we needed tokenization for is our own web front-end to the\nvarious AI chatbots out there.  We started off using OpenAI’s\ntokenization library, tiktoken, which we set up by\nstarting a Python server that we could hit over HTTP.\n\nBut, this was a bit of a grungy setup, and we only had access to the\ntoken counter from the server, not the client. A pure OCaml\nimplementation would solve both problems at once, since our client is\nan OCaml program too.\n\nSemyon Savkin’s intern project was to write such an\nimplementation. Token counting is not trivial—it’s not like you just\nsplit your input string on spaces—and an early challenge was finding\nan OCaml regex library that supported all the features used by the\nregex in tiktoken. Nothing that we found was suitable, especially\ngiven the constraint that it had to work in the browser. Fortunately,\nthe regex was simple enough that it was not too difficult for Semyon\nto handcraft the code for the automaton.\n\nThe goal at first was to check that the program’s behavior conformed\n100% with the reference implementation, so Semyon wrote a stress test\nprogram to spot any differences.  But it soon became clear that this\nwas too strict of a requirement, since even a slight difference in the\nunicode version can cause (very rare) tokenization differences. So\nSemyon needed to find a way to relax the tests enough to allow for\nsmall deviations, without losing too much bug-finding power.\n\nOur initial implementation used a very functional style, with lists\nand maps. The code was nice and simple, but just not fast enough. So,\nSemyon spent some time profiling and experimenting, and ended up with\na more imperative implementation leveraging hash-tables and arrays,\nwhich, along with algorithmic improvements, made a big difference.\n\nBy the end of the internship, Semyon had produced two fully\nfunctioning tokenizers. We compared the results against both the\nPython server and also the reference implementations as accessed\nthrough the Python API, which were written in Rust. When measured in\nbytes per microsecond, we blew the Python server out of the water for\nshort messages, due to network latency. But even doing an\napples-to-apples comparison with the Rust implementations, we found\nthat our implementation was marginally faster on average for OpenAI\ntokenization, and a bit less than twice as fast on average for\nAnthropic tokenization:\n\n\n  \n\n\nOne thing to note about the above graph is that, despite being faster,\nour variance was worse, which is probably due to GC pauses.  This\ncould probably be brought down by being more careful about allocation,\nbut the variance just wasn’t a problem for this application.\n\nWe didn’t really expect to beat the performance of OpenAI’s\nimplementation, so that was a pleasant surprise!\n\nAsync tracing\n\nppx_tracing is an OCaml syntax extension that provides\nhigh-performance introspection capabilities for OCaml programs. To use\nit, all you have to do is add a small @trace annotation to an\nexisting function:\n\nlet[@trace \"demo\" \"compute\" ~n:(n : int)] compute n = (* ... *)\n\n\n\nThen, you just have to call Tracing_probes.start somewhere in your\nexecutable, and at runtime you’ll get a UI for viewing traces, built\non top of Perfetto:\n\n\n  \n\n\nWhen we released ppx_tracing internally, there was one main issue that\nkept people from really wanting to use it: it didn’t work with\nasynchronous code.  In particular, it couldn’t represent suspending\nexecution in one function and resuming in another.\n\nThis was a significant limitation, since most real-world programs do\nsomething asynchronous, say writing to a file or fetching data over\nthe network.  These operations are provided by the Async library,\nwhich lets us wrap asynchronous computations in “deferreds”—the\nOCaml equivalent of a Promise in Javascript.  Let’s consider the\nfollowing Async program for checking disk space usage:\n\nlet[@trace \"demo\" \"process\"] rec process_directory path =\n  let%bind stat = Filesystem_async.stat path in\n  [%trace.instant \"demo\" \"stat\"];\n  let num_bytes = Int63.to_int stat.size |&gt; Option.value ~default:0 in\n  match stat.kind with\n  | Regular -&gt;\n    return num_bytes\n  | Directory -&gt;\n    let%bind files = Filesystem_async.ls_dir path in\n    let%bind entry_sizes =\n      Deferred.List.map ~how:(`Max_concurrent_jobs 10) files ~f:(fun file -&gt;\n        let res = process_directory (path /?. file) in\n        res)\n    in\n    return (List.fold entry_sizes ~init:0 ~f:( + ))\n  | _ -&gt; return 0\n;;\n\n\n\nRunning the program with tracing enabled produces a confusing\nresult. Here, spans representing calls to process_directory only\ncapture the time taken to allocate a new deferred.  Further, since the\ntrace only keeps track of one pending function call, we can’t know\nwhich invocation of process_directory generated each instant event.\n\n\n  \n\n\nThe crux of Sasha Hydrie’s intern project was to figure out how to\nintegrate with the Async scheduler to keep track of multiple\nconcurrent execution contexts.  This required associating a unique ID\nwith each async function call.\n\nLuckily, the Async scheduler can store metadata along with each\ndeferred computation.  When switching to a new task, the scheduler\nmakes its data (known as the “execution context”) globally accessible.\nTherefore, upon entering an async function, we can store a unique\n“correlation ID” in the context.  This gives us the ability to\ndistinguish between multiple invocations of process_directory.\nLater on, when the task actually executes, we can query the current ID\nto see which function we’re in.\n\nSasha extended the PPX with a new annotation (@trace.async)\nimplementing this behavior.  The resulting IDs must end up in the\nfinal trace file, so he also updated our Fuchsia trace format tooling\nto support the relevant event types.  Now, updating our example with\n@trace.async gives a much more sensible result—we can see how long\neach call is actually in flight, and how many are executing in\nparallel.  Further, instant events are able to query the active\ncorrelation ID to determine which call they are a part of.\n\n\n  \n\n\nHowever, not all functions are asynchronous. We don’t want every\nsynchronous function to create a new track, but we also don’t want to\nmix them all together.  Therefore, Sasha added a new async mode to the\nPPX, where synchronous function calls will also query the current\ncorrelation ID and dynamically attach to a parent async call.\n\nAlready Sasha’s work has enabled wider adoption of ppx_tracing; the\ntool should now work in most real-world programs. The team behind\nIron, our in-house code review system, used it to improve\ntheir server’s startup time by about 50%, within days of the final\nproject being released.\n\nGiven the success of the project, there is, of course, more work yet\nto do: we’re hoping to integrate ppx_tracing with a new distributed\ntracing system we’ve built, as well as perform more detailed tracing\nof the Async scheduler itself.\n\nSo, what are you waiting for?\n\nSummarizing the whole summer in just a handful of projects is an\nimpossible task, but I hope this gives you a flavor of the kinds of\nthings that our interns do.\n\nIf this sounds like fun, then you should\napply. The\ninternship is a great chance to learn, both about software\nengineering, and about the world of trading. And if you’d like to know\nmore about our interview process, take a look\nhere.\n\n",
        "url"      : "https://blog.janestreet.com/what-the-interns-have-wrought-2023/",
        "image"    : "https://blog.janestreet.com/what-the-interns-have-wrought-2023/interns.jpg",
        "topic"    :  ["technology","internship"] ,
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Oxidizing OCaml: Data Race Freedom",
        "date"     : "September 1, 2023",
        "authorId" : "mslater",
        "author"   : "Max Slater",
        "tags"     : [],
        "minsToRead" : 28,
        "content"  : "OCaml with Jane Street extensions is available from our public opam repo. Only a slice of the features described in this series are currently implemented.\n\nIn part one, we discussed how OCaml’s locality mode\nenables safe stack allocation.\nIn part two, we explored how\nthe uniqueness and linearity modes represent ownership.\nIn this (final) post, we leverage modes to define a statically data-race-free API in\nmulticore OCaml.\n\n\n\nData Races\n\nWith the recent release of OCaml 5, shared-memory parallelism has become a first-class language feature.\nOCaml programs now span multiple domains, each of which corresponds to an operating-system thread.\nHigher-level code can schedule work onto domains using async-await semantics.\n\nLet’s consider extending Async to support parallel tasks.\nUsing a simplified interface, we’ll spawn two invocations of a function before waiting on their results:\n\nval parallel : (unit -&gt; 'a) -&gt; 'a Deferred.t\nval await : 'a Deferred.t -&gt; 'a\n\nlet count () =\n    let i = ref 0 in\n    let incr () = for _ = 1 to 100_000 do i := !i + 1 done in\n    let p1, p2 = parallel incr, parallel incr in\n    let (), () = await p1, await p2 in\n    print_int !i\n;;\n\n\n\nNaively, combining parallelism with mutability leads to data races.\nWhen one domain writes to a memory location while others access it, readers will observe unspecified results.\nIn languages like C++, we’re forced to conclude that data races are undefined behavior.\nOn the other hand, OCaml’s memory model makes races less dangerous—despite producing unpredictable results,\nthey never invalidate type safety, so cannot crash the program.\n\nSo, what happens if we run the above code?\n\n\n\n  &gt; 118533\n\n  \n\n\nData races still cause bugs!\nCorrectly authoring a parallel-safe mutable data structure\nrequires careful usage of atomics, mutexes, or other traditional synchronization primitives.\nGetting it wrong can cause nondeterministic failures that are difficult to debug.\n\nRust is comparatively stricter: data races are unrepresentable,\nas all mutations must go through exclusively owned\nreferences. However, Rust’s underlying ownership system results in complicated code, especially\nin the presence of concurrency.\n\nThanks to OCaml’s garbage collector, most code does not need to worry about ownership.\nThis allows users to avoid the complexity of tracking ownership in the type system.\nIn this post, we seek to make data races unrepresentable without losing this ability.\nFirst, we’ll use modes to prohibit sharing mutable data between domains, only allowing mutation via ownership-based exclusive references.\nSecond, we’ll use capsules to safely encapsulate shared mutability without tracking fine-grained ownership constraints.\n\nThe Sync Mode\n\nThe sync mode describes values that can be freely shared between domains.\nValues are sharable if they do not allow concurrent writes, so:\n\n\n  Immutable values are always sync, as no writes ever occur.\n  Exclusively mutable values are also sync, since all writes must go through\nexclusive references.\n  Other mutable values are unsync. We have no control over when they may be written to, so they can’t be safely shared between domains.\n\n\ntype x = { x : int }\ntype y = { exclusively mutable y : int }\ntype z = { mutable z : int }\n\nlet sync x = { x = 0 }\nlet sync y = { y = 0 }\nlet sync z = { z = 0 }\n\n\n\n7 | let sync z = { z = 0 }\n                 ^^^^^^^^^\nError: found an unsync value where a sync value was expected.\n\n\nLike uniqueness, a sync parameter represents a promise by the caller.\n\n\n  \n    A sync parameter requires the caller to provide sharable values.\nThe callee is allowed to share the value with other domains.\n  \n  \n    An unsync parameter does not encode a promise from the caller; the\ncallee must not share the value with another domain.\n  \n\n\nA sync variable may be used as if it were unsync, so sync is the sub-mode.\nHence, unsync values may reference sync values, but not vice versa.\nFor example, a closure capturing an unsync value must also be unsync.\n\nUsage\n\nArmed with our new mode axis, we can make parallel require a sync closure.\nWe’ll want to schedule tasks that capture unique variables, so the closure will also be at mode once.\nLastly, since the function’s result will be passed back to our domain, it too must be sync.\n\nval parallel : (unit -&gt; 'a @ sync) @ sync once -&gt; 'a Deferred.t\n\n\nThis signature involves several modes, so let’s translate it to\na new version of the “@” syntax. For better readability, we’ll write one\nannotation that may expand function arrows.\n\nval parallel : (unit -&gt; 'a) -&gt; 'a Deferred.t\n             @ sync once (. -&gt; sync) -&gt; .\n\n\n\nCompiling our example program now results in an error:\n\nlet count () =\n    let i = ref 0 in\n    let incr () = for _ = 1 to 100_000 do i := !i + 1 done in\n    let () = await (parallel incr) in\n    print_int !i\n;;\n\n\n\n6 | let () = await (parallel incr) in\n                             ^^^^\nError: found an unsync value where a sync value was expected.\n\n\nThe closure incr captures a mutable reference, so it can’t be passed to parallel.\n\nThe sync mode is sufficient to prevent data races.\nLike Rust, shared values may only be mutated via exclusive references.\nAt runtime, we’ll find that our program partitions its mutable values per domain:\n\n \n\n\n\nIn this diagram, red circles are mutable; blue circles are exclusively-mutable or immutable.\nEach domain contains a collection of unsync values, each of which points toward some red circle.\nAll domains may reference sync values, which only see blue circles.\nCritically, there are no pointers into a domain: we cannot traverse from one domain’s unsync values to another’s.\n\nExclusive Mutability\n\nIn our new model, we can only share exclusively mutable data structures.\nNaively, we could convert our example program to use an exclusively mutable counter—but\nwe’ll find that exclusivity requires tracking the same precise ownership constraints we’d find in Rust.\nFor example:\n\ntype counter = { exclusively mutable i : int }\n\nlet count () =\n    let c = { i = 0 } in\n    let incr () = for _ = 1 to 100_000 do &c.i &lt;- &c.i + 1 done in\n    let () = await (parallel incr) in\n    print_int c.i\n;;\n\n\n\n6 | let () = await (parallel incr) in\n                             ^^^^\nError: this value escapes its region.\n\n\nHere, incr borrows the counter, making it a local closure—and parallel tasks must be global.\nWe can instead explicitly mark incr as global, causing the counter to be moved into the closure.\n\nlet count () =\n    let c = { i = 0 } in\n    let global incr () = for _ = 1 to 100_000 do &c.i &lt;- &c.i + 1 done in\n    let () = await (parallel incr) in\n    print_int c.i\n;;\n\n\n\n5 | print_int c.i\n              ^\nError: c is used uniquely so cannot be used twice.\n\n\nBut now we can’t reference the counter outside incr—its ownership has been transferred.\nTo fix this, we can return the counter from the parallel task.\n\nlet count () =\n    let c = { i = 0 } in\n    let global incr () = for _ = 1 to 100_000 do &c.i &lt;- &c.i + 1 done; c in\n    let c = await (parallel incr) in\n    print_int c.i\n;;\n\n\n\n\n  val count : unit -&gt; unit\n\n&gt; 100000\n\n  \n\n\nSince parallel tasks cannot borrow, we’re forced to explicitly move the counter between domains—it’s never actually shared.\nYou might have seen this coming: if multiple domains reference an exclusively-mutable value, it can’t be mutated.\n\n \n\n\n\nTechnically, we got what we asked for: we can write parallel code without fear of data races.\nHowever, if we can never share exclusive references, exclusive mutability isn’t useful in practice.\nTherefore, we will introduce mutexes.\n\nMutexes\n\nTraditionally, mutable state can be safely shared between domains via careful use of locks.\nThe simplest kind of lock is a mutex, which assures that only one domain can access a\nvalue at a time.\n\nIn OCaml, a mutex interface might look like the following:\nmodule Mutex : sig\n    type 'a t\n\n    val create : 'a -&gt; 'a t\n               @ unique sync -&gt; .\n\n    val with_lock : 'a t -&gt; ('a -&gt; 'b) -&gt; 'b\n                  @ . -&gt; local (local exclusive -&gt; .) -&gt; .\nend\n\n\n\nCalling with_lock locks the mutex, borrows the wrapped value, runs the callback, and finally unlocks the mutex.\nIf the mutex was already locked, we know some other domain is accessing the value,\nso we first wait for it to release the mutex.\nTherefore, mutexes can be safely shared across domains.\n\nWe can illustrate a mutex as a green circle.\nIts single outgoing pointer may only be traversed by one domain at a time, so it provides an exclusive reference.\n\n \n\n\n\nImportantly, mutexes are created from unique sync values.\nStipulating sync assures that no unsync value—which may be visible to another domain—is ever made accessible via the mutex.\nUniqueness additionally guarantees that the mutex holds the only pointer to its contents.\n\ntype counter = { exclusively mutable i : int }\n\nlet count () =\n    let mutex = Mutex.create { i = 0 } in\n    let worker () =\n        for _ = 1 to 100_000 do\n            Mutex.with_lock mutex (fun counter -&gt; counter.i &lt;- counter.i + 1)\n        done\n    in\n    let p1, p2 = parallel worker, parallel worker in\n    let (), () = await p1, await p2 in\n    let result = Mutex.with_lock mutex (fun counter -&gt; counter.i) in\n    print_int result\n;;\n\n\n\n\n  &gt; 200000\n\n  \n\n\nThe addition of locks gives us capabilities roughly equivalent to safe Rust: we can trade off exclusive borrows\nwithout explicitly moving the underlying value.\nHowever, this API is not very flexible.\nFirst, we must create a mutex for each shared value.\nSecond, we’re still restricted to sync data—we can’t just wrap an existing mutable data structure in a mutex.\n\nShared Mutability\n\nRestricting ourselves to exclusive mutability means we can’t use shared-ownership data structures.\nTo see why, let’s attempt to implement a mutable circularly linked list.\n\ntype 'a node = { data : 'a\n               ; exclusively mutable next : 'a node }\n\nlet create data =\n    let rec node = { data; next = node } in\n    node\n;;\n\n\n\n\n  val create : 'a -&gt; 'a node\n\n  \n\n\nSince create does not return a unique list, we’ll never be able to mutate the result—even within a domain, we can never exclusively reference a cyclic value.\n\n \n\n\n\nThis problem is endemic in Rust, where shared mutation requires unsafe code.\nOCaml’s garbage collector enables a safe solution—mutable fields—but the resulting data structure isn’t sync.\nIf we try to use a mutable value in parallel, we’ll get an error:\n\ntype 'a node = { data : 'a\n               ; mutable next : 'a node }\n\nlet count () =\n    let rec head = { data = 0; next = head } in\n    let append () = head.next &lt;- { data = 1; next = head } in\n    let () = await (parallel append) in\n    print_int head.next.data\n;;\n\n\n\n7 | let () = await (parallel append) in\n                             ^^^^^^\nError: found an unsync value where a sync value was expected.\n\n\nRust works around this issue by allowing shared-ownership data structures to present\nan exclusively-mutable API.\nThis abstraction allows the type to be protected by a mutex: its underlying shared mutability is\nencapsulated in its unsafe implementation.\n\n \n\n\n\nIn OCaml, we need a way to encapsulate unrestricted mutability without using unsafe language features.\nFor this purpose, we introduce capsules.\n\nCapsules\n\nIn our initial design, each domain was allowed to reference a collection\nof unsync data.\nFor example, a mutable circularly linked list:\n\n \n\n\n\nThere are no pointers into our domain, so no other domains can access our list.\n\nWe can relax this restriction by creating collections of unsync data that aren’t associated with a particular domain.\nWe call these collections capsules.\nCapsules allow incoming green pointers, which may only be traversed by one domain at a time.\n\n \n\n\n\nTo enforce exclusivity, we’ll need a way to indicate which domain is currently accessing which capsule.\nThis is the purpose of keys, which have a one-to-one correspondence with capsules.\n\nmodule Capsule : sig\n\n  module Key : sig\n    type 'k t\n\n    type packed = Key : 'k t -&gt; packed\n  end\n\n  val create : unit -&gt; Key.packed\n             @ . -&gt; unique\n  (* ... *)\nend\n\n\n\nWhen we create a capsule, we get back a Key.packed, which hides the underlying type 'k.\nWe know that this 'k exists, but that’s all we know—so we call 'k an existential type.\n\nConcretely, creating a capsule mints a brand-new 'k that is associated with only the returned key.\nThis mechanism uniquely identifies keys based on their creation site, allowing us to distinguish between them.\nFor example:\n\nval require_same_key : 'k Capsule.Key.t -&gt; 'k Capsule.Key.t -&gt; unit\n\nlet distinct_keys () =\n    let Key key0 = Capsule.create () in\n    let Key key1 = Capsule.create () in\n    require_same_key key0 key1\n;;\n\n\n\n6 | require_same_key key0 key1\n                          ^^^^\nError: this expression has type $Key_'k1 Capsule.Key.t\n       but an expression was expected of type $Key_'k0 Capsule.Key.t\n\n\nWe can never pass two different keys to require_same_key, because distinct keys necessarily have distinct types.\n\nPointers\n\nWhen a domain has exclusive access to a key, it may traverse any\ngreen pointer into the associated capsule.\nWe can model this behavior with the following signature, where a\ncapsule pointer of type ('a, 'k) t references a value of type 'a and\nrequires the 'k Key.t to traverse.\n\nmodule Capsule : sig\n\n  (* ... *)\n\n  module Ptr : sig\n\n    type ('a, 'k) t\n\n    val create : ('a -&gt; 'b) -&gt; 'a -&gt; ('b, 'k) t\n               @ local once sync (sync -&gt; .) -&gt; sync -&gt; .\n\n    val map : 'k Key.t -&gt; ('a -&gt; 'b) -&gt; ('a, 'k) t -&gt; ('b, 'k) t\n            @ local exclusive -&gt; local once sync -&gt; . -&gt; .\n\n    val extract : 'k Key.t -&gt; ('a -&gt; 'b) -&gt; ('a, 'k) t -&gt; 'b\n                @ local exclusive -&gt; local once sync (. -&gt; sync) -&gt; . -&gt; sync\n  end\n\n  val destroy : 'k Key.t -&gt; ('a, 'k) Ptr.t -&gt; 'a\n              @ unique -&gt; . -&gt; .\nend\n\n\n\nThe first three functions each require a local once sync callback. The annotation indicates that\nthe callback will not be stored, will only be called once, and cannot close\nover unsync values.\n\nThat’s a lot to take in—let’s unpack each new operation.\n\n\n  \n    create initializes a capsule containing the result of a function.\nBecause we’re making the function’s input available to another capsule, it must be sync.\nIts output, however, may be unsync, allowing us to create mutable state.\n\n     (* Create a mutable reference inside a capsule. *)\n let pointer = Capsule.Ptr.create (fun i -&gt; ref i) 0\n\n    \n     \n\n  \n\n    Creating a pointer does not require a key, since it does not provide access to any preexisting data.\n  \n  \n    map executes a function inside an existing capsule, arbitrarily restructuring the data therein.\nIt requires an exclusive reference to the key, implying that only one domain can run map at a time.\n\n    For example, we can wrap our reference in another reference, creating a mutable chain:\n\n    let Key key = Capsule.create ()\nlet pointer = Capsule.Ptr.create (fun i -&gt; ref i) 0\n(* Wrap the reference in another reference. *)\nlet pointer = Capsule.Ptr.map &key (fun r -&gt; ref r) pointer\n\n    \n\n     \n\n  \n\n    Critically, map also allows us to create additional pointers into the same capsule.\n  In this example, map returns a pointer to our newly created state, which\n  we may access using the same key.\n  \n  \n    extract also executes a function inside a capsule, but directly returns its result to the caller.\nThe returned value must be sync, as it is now visible to the calling domain.\n\n    let Key key = Capsule.create ()\nlet pointer = Capsule.Ptr.create (fun i -&gt; ref i) 0\n(* Return an immutable value read from the reference. *)\nlet result = Capsule.Ptr.extract &key (fun i -&gt; !i) pointer\n\n    \n\n     \n\n  \n\n    Conveniently, we can match the type of each function with the mode of its callback.\n  Creation inputs a sync value, mapping a value to a pointer;\n  extraction outputs a sync value, mapping a pointer to a value.\n  Similarly, map neither inputs nor outputs a sync value, so maps pointers to pointers.\n  \n  \n    destroy merges a pointer into the caller’s capsule.\nWe must provide the associated key uniquely, guaranteeing that no other domain can traverse this pointer again.\n\n    let Key key = Capsule.create ()\nlet pointer = Capsule.Ptr.create (fun i -&gt; ref i) 0\n(* Merge the reference into the enclosing capsule. *)\nlet i = Capsule.destroy key pointer\n\n    \n\n     \n\n  \n\n    In this diagram, destroy recolors a green pointer to black by consuming its associated key.\n  \n\n\nCapsules provide two important capabilities that mutexes do not.\n\n\n  \n    Aggregating ownership: a mutex always protects a unique value, that is, a single pointer.\nCapsules associate a semantic key with arbitrarily many pointers.\n  \n  \n    Protecting shared mutable state: mutexes only contain sync values.\nCapsules enable us to allocate shared mutable data structures without tracking their ownership at the type level.\n  \n\n\nPutting it All Together\n\nWe’ve built a way to trade off exclusive references (mutexes), and a way to associate a key with a dynamic data structure (capsules).\nCombining these approaches—storing keys in mutexes—lets us create a statically data-race-free version of any existing data structure!\n\nFor example, a hash table:\n\nmodule Locked_hashtbl = struct\n    type t = Table :\n        { table : ((int, string) Hashtbl.t, 'k) Capsule.Ptr.t\n          lock  : 'k Capsule.Key.t Mutex.t } -&gt; t\n\n    let create () =\n        let Key key = Capsule.create () in\n        let table = Capsule.Ptr.create Hashtbl.create () in\n        let lock = Mutex.create key in\n        Table { table; lock }\n    ;;\n\n    let add_exn ( Table { table; lock } ) k v =\n        Mutex.with_lock lock (fun key -&gt;\n            Capsule.Ptr.extract key (fun table -&gt;\n                Hashtbl.add_exn table k v) table)\n    ;;\n\n    let length ( Table { table; lock } ) =\n        Mutex.with_lock lock (fun key -&gt;\n            Capsule.Ptr.extract key Hashtbl.length table)\n    ;;\nend\n\n\n\nThe implementation is almost as concise as wrapping a table with a mutex\nin C++ or Java, but far less error-prone.\nMixing up locking is a compilation error—once it compiles, we can be sure\nthat our code is free of data races.\n\nlet count () =\n    let table = Locked_hashtbl.create () in\n    let worker start () =\n        for i = start to start + 99_999 do\n            Locked_hashtbl.add_exn table i (Int.to_string i)\n        done\n    in\n    let p1, p2 = parallel (worker 0), parallel (worker 100_000) in\n    let (), () = await p1, await p2 in\n    let result = Locked_hashtbl.length table in\n    print_int result\n;;\n\n\n\n\n  &gt; 200000\n\n  \n\n\nPerformance-minded readers might be concerned about the runtime overhead of this approach—but\nwith two clever observations, we may compile it to a simple low-level lock.\n\n\n  \n    Keys do not contain information; we either have one or we don’t.\nThat means they take zero bits to store, and don’t have to exist at runtime at all.\n  \n  \n    The mutex is essentially a key-option, hence only takes on two values—locked and unlocked.\nAt runtime, the mutex is a bool; acquire/release are atomic test-and-set instructions.\n  \n\n\nMore practically, we could implement locking using futexes.\nFortunately, separating the concepts of keys and locks lets users swap between\nlock implementations without rewriting their data structures.\n\nAfterword\n\nGiven the initial success of OCaml 5, we’re optimistic about the future of writing\nhigh-performance parallel programs in OCaml.\nWe believe data-race-free OCaml will enable users to write correct and efficient code,\nwithout complicating the single-core experience.\n\nAt Jane Street, we’ve been using locality in production for over a year, and\nhave recently merged compiler support for the unique and once modes.\nOnce they’ve been extensively tested in our internal environment, we hope to contribute these features to upstream OCaml.\nInstructions for installing OCaml with Jane Street extensions using opam\nare available on GitHub.\n\nThis post concludes Oxidizing OCaml, but it’s not the end for modes.\nWe’re planning to extend the design presented here with sync’s dual mode axis,\nallowing for more fine-grained restrictions on data access.\nSimilarly, we’re exploring modal representations for algebraic effects,\noff-heap data, and compile-time computations—as well as mode-polymorphic functions.\n",
        "url"      : "https://blog.janestreet.com/oxidizing-ocaml-parallelism/",
        "image"    : "https://blog.janestreet.com/oxidizing-ocaml-parallelism/oxidizing-ocaml-parallelism.jpg",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "We're sponsoring SoME3",
        "date"     : "July 6, 2023",
        "authorId" : "cfalls",
        "author"   : "Craig Falls",
        "tags"     : [],
        "minsToRead" : 2,
        "content"  : "Jane Street is excited to announce our sponsorship of\nSoME3, Grant Sanderson and James Schloss’s\nthird Summer of Math Exposition. SoME is a contest that Grant and\nJames created to encourage the development of fun and interesting\nmathematics education videos.\n\nWe've long been big fans of Grant and his YouTube channel,\n3Blue1Brown.\nThere's a lot of great math content on YouTube these days, but Grant\nhas a really unique voice in the space. It's a rare talent to explain\nthings in such a compelling way. Some years ago we sponsored one of\nGrant's videos, and\nwe've been waiting since then for another chance to work with him. (One\nof our own even tried his hand at making a\nvideo using\nGrant's software!)\n\nSo often, mathematics is presented in its finished form, where some\ndefinitions drop out of nowhere, and those definitions lead to certain\nconsequences, and then maybe, if you're lucky, you find some possible\napplications. Grant's approach is often the opposite—starting with a\nsort of natural question or problem that leads to other questions. At\nsome point, you discover the core issue, and this core idea turns out to\nbe applicable in many other cases. The definitions emerge naturally from\nwhat minimal structure a problem needs to have in order for that core\nidea to apply. Combined with great taste in animations and thoughtful\npacing, it's a lot of fun!\n\nThe hope with SoME3, in the short term, is to encourage more people to\ncreate online math content and to make it easier to find by a broader\naudience—creating a sort of curated list of Grant-approved content.\nLonger term, encouraging more people to get involved in creating math\ncontent should lead to a more diverse collection of creators and\ncontent.\n\nWhile a lot of our work involves mathematics—representing Gaussian\ndistributions as covariance matrices, say—math education actually\nhits closer to a lot of what we do. Being able to share intuitive\nunderstandings of the math we use, so that the applications and the math\nunderlying them are tied closely together, is critical to our success.\nOur strategy has always been to hire smart, hard-working and curious\npeople, and teach them the math, programming, and trading concepts they\nneed to succeed here.\n\nWe're excited to now help out with SoME!\n",
        "url"      : "https://blog.janestreet.com/were-sponsoring-some3/",
        "image"    : "https://blog.janestreet.com/were-sponsoring-some3/techblog-some3.png",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Oxidizing OCaml: Rust-Style Ownership",
        "date"     : "June 21, 2023",
        "authorId" : "mslater",
        "author"   : "Max Slater",
        "tags"     : [],
        "minsToRead" : 23,
        "content"  : "OCaml with Jane Street extensions is available from our public opam repo. Only a slice of the features described in this series are currently implemented.\n\nIn part one, we discussed how OCaml’s\nlocality mode enables safe stack allocation. In this post, we will\nexplore additional modes for representing ownership.\n\n\n\nOwnership\n\nOCaml isn’t a pure language—in fact, records containing mutable\nfields are relatively common.  The compiler also doesn’t track where\nvalues may be referenced, so must assume all values are potentially\nshared. At runtime, this means several parts of the program may hold\npointers to any particular piece of data.\n\nShared mutable data is notoriously difficult to reason about, and\ndoubly so with the addition of concurrency.  Conversely, pure\nfunctions provide useful guarantees, so OCaml encourages a side-effect\nfree style.  For example, records support functional updates, which\npreserve immutability:\n\ntype 'a box = { x : 'a option }\n\nlet clear box =\n    { box with x = None }\n;;\n\n\n\n  val clear : 'a box -&gt; 'b box\n\n  \n\n\nUnfortunately, immutability comes at a cost. A functional update\nallocates a new record, since other code may hold references to the\noriginal record. We illustrate the result here—purple blocks are\nlive values, which become red when unreferenced:\n\n \n\n\n\n\nWriting to an explicitly mutable field would not incur this\nallocation, but instead exposes us to shared mutability.\n\nUltimately, immutable updates only approach this problem from one\ndirection. We could instead imagine eliminating the shared portion\nof “shared mutability.”  If we knew that our code holds the only\nreference to the original record, the functional update immediately\nmakes the original value unreferenced—i.e. available for\ncollection and re-use. Therefore, the runtime could equivalently\nperform an in-place update, even though the type exposes an\nimmutable interface.\n\n \n\n\n\n\nOne of Rust’s primary design goals is to entirely forbid shared\nmutability, which it achieves by only allowing mutable values to be\nreferenced uniquely. In OCaml, uniqueness isn’t a strict necessity,\nsince the garbage collector can handle cyclic lifetimes arising from\nunrestricted mutability.  However, we still stand to benefit from the\noptimizations and expressivity enabled by unique references.  In fact,\nin part three of this series, we’ll see how uniqueness enables us to statically prohibit\ndata races in multicore OCaml.\n\nThe Uniqueness Mode\n\nA variable has the unique mode when it is the only reference to a\nparticular value in the entire program. Variables without this\nrestriction have the default shared mode.\n\nPreviously, we saw that the compiler tracks which variables may escape\ntheir region; these were global.  Uniqueness can be handled in a\nsimilar fashion—tracking which variables may have aliases tells\nus which are shared.  Unique variables are hence those used at most\nonce.\n\nFor example, given a function that takes a unique argument:\nval consume : 'a @ unique -&gt; unit\n\n\nWe’re no longer allowed to use this function in a way that violates uniqueness.\nlet foo (unique x) =\n    consume x;\n    x\n;;\n\n\n\n3 | x\n    ^\nError: x is used uniquely so cannot be used twice.\n\n\nOCaml’s uniqueness mode mirrors a similar concept in Rust, where\nvalues are unique by default. When passed to a function “by value,” a\nRust variable is moved. Ownership is transferred to the callee,\nwhich has free rein to mutate or destroy the value.\n\nSub-Moding\n\nA unique variable may be safely used as if it were shared, so unique\nis a sub-mode of shared. This hierarchy is reversed compared to\nlocality: the non-default mode, unique, is the sub-mode.\n\nIn this sense, unique behaves like global and shared behaves\nlike local.  For example, we can construct a shared variable by\nreferencing a unique value, but not vice versa.\n\nlet bar (unique x) =\n    x, x\n;;\n\n\n\n  val bar : 'a @ unique -&gt; 'a * 'a\n\n  \n\n\nUniqueness’ submoding relation makes it more intuitive to interpret\na unique function parameter as a promise by the caller.\n\n\n  \n    A unique parameter means the caller promises to provide uniquely\nreferenced values. The callee is hence allowed to overwrite the\nvalue.\n  \n  \n    A shared parameter does not encode a promise from the caller; the\ncallee must assume the parameter is aliased.\n  \n\n\nStrong Updates\n\nUsing the following syntax, we can perform an in-place functional\nupdate on a unique record.\n\nlet clear (unique box) =\n    { overwrite box with x = None }\n;;\n\n\n\n  val clear : 'a box @ unique -&gt; 'b box @ unique\n\n  \n\n\nIt may alarm you to notice that clear returns a 'b\nbox—does this let us cast the box to a new type in-place?\nThe answer is yes, and it’s perfectly safe to do so. No other code\nmaintains a reference to the box, so it cannot be referred to as a\nvalue of the stale type.\n\nThis result demonstrates that unique variables provide another\nimportant capability: updating a value’s type in-place. Such an\noperation is known as a strong update, and lets us write (for\nexample) an in-place map function:\n\nlet map (f : 'a @ unique -&gt; 'b @ unique) (unique box) =\n    match box.x with\n    | None -&gt; { overwrite box with x = None }\n    | Some x -&gt; { overwrite box with x = Some (f x) }\n;;\n\n\n\n  val map : ('a @ unique -&gt; 'b @ unique) -&gt; 'a box @ unique -&gt; 'b box @ unique\n\n  \n\n\nBorrowing\n\nUnique variables also empower users to write safer API contracts.\nProcedural resource management is one obvious use case, so let’s\nattempt to define a simple set of functions for manipulating files.\n\ntype file\n\nval open_ : path:string -&gt; file @ unique\nval read  : file -&gt; string\nval close : file @ unique -&gt; unit\n\n\n\nSince close requires a unique parameter, we can be sure that files\nare never used after being closed.\n\nWhat happens when we try to use this API?\n\nlet print_and_close (unique file) =\n    print_endline (read file);\n    close file\n;;\n\n\n\n3 | close file\n          ^^^^\nError: file is used uniquely so cannot be used twice.\n\n\nEven though read does not require a unique parameter, we’re no\nlonger allowed to close the file.  Using a unique value at the shared\nmode is a one-way transformation: once we create a single shared\nreference to file, we’re no longer allowed to use file uniquely.\n\nThis behavior seems overly restrictive. We know that read\nitself doesn’t use file uniquely—so if it also doesn’t leak\nany references to file, we could keep using file uniquely after\nread returns. Fortunately, we’ve already discussed\na mode that expresses this constraint: locality!\n\nval read : file @ local -&gt; string\n\n\nlet print_and_close (unique file) =\n    print_endline (read &file);\n    close file\n;;\n\n\n\n  val print_and_close : file @ unique -&gt; unit\n\n  \n\n\nThe syntax &file denotes borrowing, which weakens the mode of\nfile to local shared. The resulting reference cannot be leaked by\nread, so after read returns, we still have the only reference to\nfile.\n\nIn Rust, borrowing is less restrictive, as the callee is paramaterized\nover the true lifetime of the borrowed value.\n\nExclusivity\n\nLet’s expand our API to include a write function.  A file represents\naccess to mutable state—and we’re avoiding shared\nmutability—so write should require a unique file.\n\nval write : file @ unique -&gt; string -&gt; unit\n\n\n\nBut when we try to use this function…\n\nlet write_and_close (unique file) =\n    write file \"data\";\n    close file\n;;\n\n\n\n3 | close file\n          ^^^^\nError: file is used uniquely so cannot be used twice.\n\n\nIf we pass file uniquely, it cannot be used again.  While we could\nreturn file from write, piping unique values through every\nfunction call isn’t a good solution.  Ideally, we want to express that\nalthough our function mutates file, it does not consume it.\n\nTo better handle this case, we add a third mode to the uniqueness\naxis: exclusive, which sits in between the initial options. A\nvariable at the exclusive mode is the only active reference to a\nparticular value.\n\nSpecifically, exclusivity expresses that while other references to\nthis value may exist, none are accessible until the conclusion of the\nexclusive reference’s region. Therefore, we may freely mutate an\nexclusive value, but we may not use it uniquely.\n\nval write : file @ local exclusive -&gt; string -&gt; unit\n\n\nlet write_and_close (unique file) =\n    write &file \"data\";\n    close file\n;;\n\n\n\n  val write_and_close : file @ unique -&gt; unit\n\n  \n\n\nBecause the call to write borrows file exactly once, &file has\nmode local exclusive.  If file was borrowed multiple times, its\nmode would instead be weakened to local shared.\n\nval write2 : file @ local exclusive -&gt; file @ local exclusive -&gt; string -&gt; unit\n\n\nlet write_twice (unique x) =\n    write2 &x &x \"data\"\n;;\n\n\n\n2 | write2 &x &x \"data\"\n              ^^\nError: found a shared value where an exclusive value was expected.\n\n\nExclusively Mutable Fields\n\nThe exclusive mode is not just useful for resource management. It\nalso expresses that a function may mutate a record, but never\nstrongly updates it.  Given an exclusive variable, we know our code\nhas the only active reference to the value, but later code may inspect\nit at the current type.\n\nTherefore, we also introduce exclusively mutable fields:\n\ntype counter = { exclusively mutable i : int }\n\n\n\nAn exclusively mutable field behaves as you might expect. It may be\nwritten to via an exclusive reference, but appears immutable when\naccessed via a shared reference. With exclusively mutable fields, we\ncan enjoy the performance benefits of mutability without the strict\nmove semantics of uniqueness.\n\nlet increment (local exclusive counter) =\n    counter.i &lt;- counter.i + 1\n;;\n\n\n\n  val increment : counter @ local exclusive -&gt; unit\n\n  \n\n\nMarking individual fields as exclusively mutable is a natural\nextension of OCaml’s current mutability story, but notably diverges\nfrom Rust’s approach. In Rust, variable bindings are marked as mutable\nupon declaration—that is, mutability becomes another deep\nproperty. (Perhaps a mode!) Despite this difference, OCaml’s\nuniqueness mode axis now mirrors Rust’s hierarchy of references:\n\n\n  \n    A unique variable is akin to a value in Rust. Passing a parameter by\nvalue allows the callee to destroy it.\n  \n  \n    An exclusive variable is akin to a mutable reference in Rust.\nPassing a parameter by mutable reference allows the callee to\noverwrite, but not destroy, the value.\n  \n  \n    A shared variable is akin to an immutable reference in Rust.\nPassing a parameter by immutable reference restricts the callee to\nreading or copying the value.\n  \n\n\nThere is one important difference: OCaml’s references are not\nparamaterized over lifetime variables. Like we saw in\npart one, this means uniqueness doesn’t lead to\nhigher-order polymorphism and doesn’t interfere with type inference.\n\nClosures & Linearity\n\nSo far, we’ve seen how the uniqueness and locality axes interact to\nrepresent ownership. Unfortunately, we can now write the following\ncode…\n\ntype 'a box = { x : 'a option }\n\nlet wrap (unique box) =\n    fun () -&gt; box\n;;\n\nlet unsound (unique box) =\n    let get = wrap box in\n    let a = get () in\n    let b = get () in\n    { overwrite a with x = Some 0 }, { overwrite b with x = Some \"string\" }\n;;\n\n\n\nNaively, applying get simply returns the wrapped box. But that means\ninvoking get twice creates two supposedly unique references to the\nsame box. Violating uniqueness breaks type safety, as we can use\nstrong updates to refer to the same value at multiple types.\n\n\n  val wrap : 'a @ unique -&gt; unit -&gt; 'a @ unique\nval unsound : 'a box @ unique -&gt; int box @ unique * string box @ unique\n\n  \n\n\nTherefore, we must not allow wrap to return a closure with signature\nunit -&gt; 'a box @ unique. We know the resulting function may be\nexecuted once, but is unsafe to invoke a second time. More\ngenerally, any closure capturing a unique value must be run at most\nonce, as its first invocation may consume the value.\n\nThe Linearity Mode\n\nTo encode this restriction, we will define a third mode axis:\nlinearity. In practice, linearity modes will almost exclusively apply\nto variables of a function type (i.e. closures), but this is not a\nrequirement.\n\n\n  \n    A variable with mode once carries the restriction of being used\nat most once. It is allowed to close over unique and once\nvariables.\n  \n  \n    A variable with the default mode many does not have this\nrestriction; it may be used freely and may only close over\nshared/many variables.\n  \n\n\nA many variable can be used as if it were once, so many is a\nsub-mode of once. Like locality, the salient mode (once) is the\nsuper-mode.  From a parameter perspective, once represents a promise\nby the callee:\n\n\n  \n    A once parameter means the callee promises not to use the value\nmore than once.\n  \n  \n    A many parameter encodes no promise; the callee may use the value\nany number of times.\n  \n\n\nlet baz (once x) =\n    x, x\n;;\n\n\n\n2 | x, x\n       ^\nError: found a once value where a many value was expected.\n\n\nIn some sense, linearity is dual to the uniqueness axis. Both\nanalyses are concerned with aliasing: a once variable prohibits the\ncreation of new references to its value, whereas a unique variable\nstates that no such references exist.\n\nConsuming Variables\n\nReturning to our unsound example, wrap now returns a closure at mode\nonce.  When we try to invoke the result twice, the compiler raises\nan error:\n\nval wrap : 'a @ unique -&gt; (unit -&gt; 'a @ unique) @ once\n\n\nlet unsound (unique box) =\n    let get = wrap box in\n    let a = get () in\n    let b = get () in\n    { overwrite a with x = Some 0 }, { overwrite b with x = Some \"string\" }\n;;\n\n\n\n4 | let b = get () in\n            ^^^\nError: found a once value where a many value was expected.\n\n\nLike we saw with locality, a modal return places a requirement on the\ncaller. In this case, whoever calls wrap promises not to use its\nresult multiple times, as desired.\n\nRust offers a slightly less expressive solution: closures that consume\nvalues implicitly implement the FnOnce trait, which only allows one\ninvocation. Unlike OCaml’s linearity modes, this trait specifically\nbounds function types.\n\nBorrowing Variables\n\nSo far, we’ve conspicuously omitted closures that capture exclusive\nreferences.  When a closure borrows a variable, it cannot use the\nvalue uniquely, so the resulting function is safe to run multiple\ntimes—that is, it needn’t be once.\n\nHowever, borrows still come with restrictions.  Borrowing always\ncreates local references, so a closure that borrows a variable must\nitself be local:\n\ntype 'a box = { exclusively mutable x : 'a option }\n\nlet wrap (unique box) =\n    let clear () = &box.x &lt;- None in\n    clear\n;;\n\n\n\n5 | clear\n    ^^^^^\nError: this value escapes its region.\n\n\nIn Rust, closures that borrow variables exhibit similar\nbehavior—the function’s lifetime cannot exceed that of the\nreferenced value.\n\nHowever, we already know clear can never outlive box, because\nbox is global.  We may instead move box into the closure and\nallow clear to escape.  To do so, we add an explicit global\nannotation:\n\nlet wrap (unique box) =\n    let global clear () = &box.x &lt;- None in\n    clear\n;;\n\n\n\n  val wrap : 'a box @ unique -&gt; (unit -&gt; unit) @ separate\n\n  \n\n\nOwnership is transferred to the closure, so box may not be used\noutside of clear.  In Rust, this behavior can be requested by adding\nthe move keyword to a closure.\n\nSeparate Functions\n\nIt was slightly misleading to say that closures capturing exclusive\nreferences can be run multiple times—this is true, but they’re\nnot re-entrant.  Intuitively, when a closure stores an exclusive\nreference, invocations must use the reference one at a time; in other\nwords, they cannot overlap.\n\nThis limitation is encoded by a third mode in the linearity axis,\nseparate.  The separate mode prohibits the creation of new\nreferences to a value during its execution. Therefore, a separate\nclosure may be invoked repeatedly, but only one invocation can be\nactive at any point in time. For example:\n\nlet separate run f = f () in\n\nrun (fun _ -&gt; ());\n\nrun (fun _ -&gt; run (fun _ -&gt; ()))\n\n\n\n5 | run (fun _ -&gt; run (fun _ -&gt; ()))\n                  ^^^\nError: found a separate value where a many value was expected.\n\n\nLuckily, most higher-order functions do not require re-entrancy. For\nexample, List.iter doesn’t overlap applications of its callback, so\nit can take a separate parameter.\n\nlet rec iter list f =\n    match list with\n    | [] -&gt; ()\n    | x :: xs -&gt; f x; iter xs f\n;;\n\n\n\n  val iter : 'a list -&gt; ('a -&gt; unit) @ separate -&gt; unit\n\n  \n\n\nNow, callers are able to provide callbacks that close over exclusive\nreferences.\n\nlet unique x = () in\n\niter [] (fun _ -&gt; write &x);\n\niter [] (fun _ -&gt; consume x)\n\n\n\n5 | iter [] (fun _ -&gt; consume x)\n            ^^^^^^^^^^^^^^^^^^^^\nError: found a once value where a separate value was expected.\n\n\nThe separate mode also has a corresponding Rust trait.  In Rust,\nclosures that capture mutable (i.e. exclusive) references implicitly\nimplement the FnMut trait, which makes invocation require a mutable\nreference.  Because OCaml does not rely on type bounds to encode\nlinearity constraints, type inference is again unaffected, whereas\nRust’s function traits further complicate the type system.\n\nOwnership in Practice\n\nJane Street’s compilers team is currently implementing the uniqueness\nand linearity axes. Based on a design by Leo White and Stephen Dolan,\nAnton Lorenzen’s summer 2022 intern project produced an initial\nuniqueness analysis. Zesen Qian is now extending the project to\nsupport borrowing and linearity.\n\nDespite locality’s success, one might be skeptical that the uniqueness\nand linearity modes will provide more than minor optimizations. After\nall, OCaml already supports explicitly mutable fields that may be\nwritten to without incurring allocations.\n\nHowever, with the release of OCaml 5, mutable fields are now subject\nto data races. If multiple domains access a field and at least one\nperforms a write, results observed by readers are\nunspecified—though the program remains type\nsafe. Fortunately, the\naddition of exclusively mutable fields gives us a powerful new tool to\ncombat this class of concurrency bugs. In part three, we will use modes to\ndefine a statically data-race-free API in multi-core OCaml.\n",
        "url"      : "https://blog.janestreet.com/oxidizing-ocaml-ownership/",
        "image"    : "https://blog.janestreet.com/oxidizing-ocaml-ownership/oxidizing-ocaml-ownership.png",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Oxidizing OCaml: Locality",
        "date"     : "May 26, 2023",
        "authorId" : "mslater",
        "author"   : "Max Slater",
        "tags"     : [],
        "minsToRead" : 20,
        "content"  : "OCaml with Jane Street extensions is available from our public opam repo. Only a slice of the features described in this series are currently implemented.\n\n\n\nRust, OCaml, and Resource Management\n\nComing from OCaml, the Rust programming language has many appealing\nfeatures.  Rust’s system for tracking lifetime and ownership allows\nusers to safely express patterns that are awkward in OCaml, such as:\n\n\n  Stack-allocated values and custom allocation schemes.\n  Managed resources that can’t be (easily) garbage collected,\ne.g. file descriptors or GPU memory.\n  Mutable data structures in the presence of concurrency.\n\n\nOn the other hand, Rust’s approach comes with some trade-offs.\nEschewing garbage collection requires careful consideration of lifetime\nand ownership throughout a codebase. Emphasizing lifetime-polymorphism\ncan also make type inference untenable, a design choice that wouldn’t\nfit OCaml.\n\nAt Jane Street, we’ve been working on extending OCaml to better support\nthese use cases, without giving up the principles that make OCaml a\nconvenient and flexible language.\n\nTo do so, we’re introducing a system of modes, which track\nproperties like the locality and uniqueness of OCaml values. Modes\nallow the compiler to emit better, lower-allocation code, empower\nusers to write safer APIs, and with the advent of multicore,\nstatically guarantee data race freedom—all in a lightweight way\nthat only affects those in need.\n\nStack Allocation\n\nThe OCaml compiler does not statically track lifetimes. Instead, it\nrelies on a garbage collector to figure out a suitable lifespan for each\nvalue at runtime. Values are collected only after they become\nunreferenced, so OCaml programs are memory-safe.\n\nTo a first approximation, this model requires allocating all values on\nthe heap. Fortunately, OCaml’s generational\nGC can\nefficiently handle short-lived values—minor-heap allocation\nsimply advances a ring buffer.\n\n\n\n\n\n\n\nHowever, placing everything on the heap is still a pessimistic\napproach. Where possible, using a specialized allocator could improve\nperformance. For example, the minor heap is typically larger than cache,\nso future allocations are likely to evict live values.  Stack allocation\nwould immediately re-use freed space, eliminating this concern.\n\n\n\n\n\n\n\nProviding an alternative to heap allocation would also have other\nbenefits:\n\n\n  \n    Every minor heap allocation brings us closer to the next minor\ncollection cycle. A minor collection incurs some fixed overhead,\nbut more importantly, frequent collection causes more values to be\nmoved to the major heap. Promoted values become much costlier to\ncollect later on.\n  \n  \n    At Jane Street, we often write “zero-allocation” code, which must\nnever trigger a GC cycle. A stack allocator would make it much\neasier to write programs that do not touch the heap.\n  \n\n\nWhen such performance concerns are relevant, one should arguably be\nusing a language based on explicit memory management, like Rust.\nHowever, garbage collection is genuinely useful; explicit management is\na burden on users. Ideally, a language could provide a spectrum of\nallocation strategies freely interoperable within a single\napplication. With modes, users can write OCaml with all the usual GC\nguarantees—but when performance is paramount, opt into the\nconsideration of lifetimes, ownership, and concurrency.\n\nLocal Variables\n\nIn OCaml, it turns out that many short-lived values can be\nstack-allocated. To safely refer to such values, we introduce\nlocal variables.\n\nDetermining whether a variable is local involves checking a certain\ncondition on its lifetime. Consider the following function:\n\nlet is_int str =\n    let opt = Int.of_string_opt str in\n    match opt with\n    | Some _ -&gt; true\n    | None -&gt; false\n;;\n\n\n\n  val is_int : string -&gt; bool\n\n  \n\n\nNaively, this function incurs a heap allocation. The compiler does\nnot know the lifetime of opt—our function could return it, or\neven store it in a global variable. Because opt could escape this\nfunction, the value referenced by opt may need to live forever.\nTherefore, it must be heap-allocated.\n\n\n\n\n\n\n\nAs the programmer, however, we can deduce that a shorter lifetime\nsuffices. In fact, opt only needs to live until we match on it.\nWhen is_int returns, opt is no longer accessible, so it could have\nsafely been allocated in stack memory local to is_int.\n\n\n\n\n\n\n\nSpecifically, opt is local because its lifetime does not exceed its\nenclosing stack frame, which we call its region. At runtime,\nentering is_int begins a region by saving the current stack pointer;\nexiting ends the region and reclaims stack-allocated memory. Since opt\nis only accessible within this region, it may safely be allocated in the\ncorresponding stack frame.\n\nNote that a stack-allocated value is not necessarily stored on the\ncontrol flow stack, as seen in languages that support alloca(). In\nthis example, we request space from a stack-based allocator backed by\nentirely unrelated memory.\n\nThe Locality Mode\n\nSo, local variables are those that do not escape their region. To\nformalize this constraint in a manner the compiler can check, we\nintroduce modes.\n\n\n  \n    By default, variables have the global mode. A global variable\nhas the capability to escape any region, so always references the\nheap.\n  \n  \n    Variables with the new local mode cannot escape their enclosing\nregion, so may refer to the stack.\n  \n\n\nA mode is attached to a variable upon declaration, either in a let\nbinding or in a function parameter.  In both cases, the compiler will\ncheck that the value does not escape its region.\n\nlet foo (local x) =\n    let local y = 0 in\n    x, y\n;;\n\n\n\n3 | x, y\n    ^\nError: this value escapes its region.\n\n\nA local parameter represents a promise by the callee: the function\nwill not store a reference to the value anywhere that could be accessed\nafter the function returns. Intuitively, it’s safe to pass a\nstack-allocated value to a function if we know the value’s lifetime will\nnot be extended.\n\nlet is_empty (local str) =\n    String.length str = 0\n;;\n\n\n\n  val is_empty : string @ local -&gt; bool\n\n  \n\n\nHere, the syntax string @ local denotes that is_empty takes its parameter\n“at” the local mode.\n\nEven without explicit mode annotations, the compiler can statically\ndetermine which variables may escape their enclosing region. Such\nvariables are assigned the global mode; all others are automatically\ninferred to be local. At this point, the compiler may construct values\nbound to local variables using stack allocation.\n\nLocal Returns\n\nReturning a local value from a function should appear contradictory,\nsince a function’s result has clearly escaped its region. On the other\nhand, if functions can only return globals, constructing fully\nstack-allocated values becomes difficult—they can only be built up\nfrom literals. The solution:\n\nlet local_list () =\n    exclave [1; 2; 3]\n;;\n\n\n\n  val local_list : unit -&gt; int list @ local\n\n  \n\n\nThe exclave keyword ends the current region and executes the given\nexpression in the enclosing region. The caller receives a local\nvariable prohibited from escaping the caller’s region.  Therefore,\nit’s safe to allocate that value on the caller’s stack frame—the\ndifference is simply which region the value lives in.\n\nlet bar () =\n    let list = local_list () in\n    list\n;;\n\n\n\n3 | list\n    ^^^^\nError: this value escapes its region.\n\n\nLocal-returning functions are the primary method of creating\nstack-allocated values, as they can programmatically build up local data\nstructures. This mechanism also allows functions to return their local\nparameters.\n\nLastly, recall that the ‘locals’ stack is distinct from the control flow\nstack, making this behavior easy to implement.\n\nLocality in APIs\n\nLocality doesn’t only facilitate stack allocation—it also lets\nus design safer APIs. The following code exhibits a common pattern\nfor resource management:\n\nCore_unix.with_file \"file\" ~mode:[ O_RDONLY ] ~f:(fun fd -&gt; (* ... *))\n\n\n\nHere, a file descriptor is opened, passed to a lambda function, and\nclosed after the function returns. This API lets users eschew manually\nclosing the file descriptor. However, there’s no guarantee that the\ndescriptor is not used after it’s closed.\n\nlet stash = ref 0 in\nCore_unix.with_file \"file\" ~mode:[ O_RDONLY ] ~f:(fun fd -&gt; stash := fd);\nCore_unix.close !stash\n\n\n\nException: Unix.Unix_error(Unix.EBADF, \"close\", ...)\n\n\nOf course, this design can be improved by making fd a local\nparameter. After changing the signature of with_file to the\nfollowing…\n\nval with_file : string -&gt; mode:open_flag list -&gt; f:(File_descr.t @ local -&gt; 'a) -&gt; 'a\n\n\n…the callback must promise not to stash away the file descriptor.\nTherefore, we know the file won’t be used after the callback returns.\n\nIn this example, we’re using modes to require a promise from the\ncaller. This usage might feel similar to local returns, and for good\nreason. Formally, when a parameter is used contravariantly, its mode\nrepresents a restriction on the callee, but when used covariantly\n(as seen here), it instead represents a restriction on the caller.\n\nModes vs. Types\n\nAbove, we declared a local integer x using the syntax let local.\nNotably, we didn’t simply add a type annotation—the local mode\ndoes not operate on types. In fact, the mode of x is entirely separate\nfrom the type of x.\n\nTypes describe data structures, that is, how to build up and take apart\nvalues.  On the other hand, a mode encodes a property independent of\ndata layout, so may be attached to a variable of any type.  To\nillustrate this behavior, type annotations specify a variable at a mode\nusing the syntax type @ mode.\n\nlet local x = 0\nlet local y = \"string\"\nlet local z = [0.0; 1.0]\n\n\n\n  val x : int @ local\nval y : string @ local\nval z : float list @ local\n\n  \n\n\nIn the case of locality, the salient property is whether a value may\nescape its region.  Variables with the global mode can escape any\nregion, so global values are correspondingly heap allocated.\nConversely, the local mode restricts a variable to its region; a local\nvalue may be stack allocated.\n\nLocality vs. Lifetimes\n\nEncoding locality with a mode has some advantages compared to Rust’s\ntype-centric approach. In Rust, reference types are parameterized over\nspecific regions represented by lifetime variables.  This design is more\nexpressive than locality, which only distinguishes values that may\nescape all regions from those that cannot escape any.\n\nOn the other hand, lifetime variables are a source of pervasive\ncomplexity.  When references are inherently polymorphic, essentially all\nfunctions become lifetime-polymorphic as well. For example, whenever a\nreference lacks a lifetime annotation, an implicit lifetime variable\nappears:\n\nfn print_string(s: &str);\n\n// Is equivalent to...\n\nfn print_string&lt;'a&gt;(s: &'a str);\n\n\n\nSince Rust supports first-class functions, the result is that\nhigher-order functions require higher-order polymorphism, for which type\ninference is undecidable in general.\n\nOCaml’s modes do not affect type inference—they preserve the types\nof existing code, so users truly don’t need to consider modes they\naren’t actively using.  In OCaml, type inference, higher-order\nfunctions, and garbage collection are all important parts of the\ndevelopment workflow, so we consider the local mode to be a good fit.\n\nModes are Deep\n\nAbove, we noted that a mode describes a property independent of data\nlayout.  Such properties are deep, as opposed to the shallow layout\nencoded by a type.  To understand this distinction, consider the\nfollowing type:\n\ntype 'a list =\n    | Empty\n    | More of 'a * 'a list\n\n\n\nDestructuring a value of type 'a list produces two possible outcomes:\neither the empty list, or a pair of a value and another list of\narbitrary shape. Hence, the type only describes the value’s top-level\nstructure.\n\nlet process list =\n    match list with\n    | Empty -&gt; (* ... *)\n    | More (head, remaining) -&gt; (* ... *)\n;;\n\n\n\nConversely, destructuring a global variable of type 'a list produces\neither an empty list, or a pair of a global value and another global\nlist. That is, if the root node of the list may escape its region, the\nsubsequent nodes clearly can too—so the entire list must be\nheap-allocated.\n\n\n\n\n\n\n\nThe same logic applies to the local case: destructuring a local\nlist produces a local value and another local list. It is possible to\ncreate a local list consisting entirely of stack allocations, so we\nmust ensure that the contents of a local list also do not escape.\n\n\n\n\n\n\n\nDeepness enables the compiler to validate usage of local data\nstructures:\n\nlet head (local list) =\n    match list with\n    | Empty -&gt; None\n    | More (head, _) -&gt; Some head\n;;\n\n\n\n4 | | More (v, _) -&gt; Some head\n                          ^^^^\nError: This value escapes its region\n\n\nIf locality didn’t exhibit “deepness,” it wouldn’t be very\nuseful—we could stack allocate the root node of a list, but we’d\nhave no way to express that further nodes may also be stack-allocated.\n\nSub-Modes\n\nGiven deepness, locality might appear to be an “all or nothing”\nchoice—so far, we’ve allocated our data structures either entirely\non the stack or entirely on the heap.  To break this dichotomy, we will\nexplore another important property of modes: each mode axis admits a\nnatural sub-typing relation.\n\nIn the case of locality, it’s intuitively safe to use a global variable\nas if it were local. For example, a function expecting a local parameter\npromises equivalent behavior whether or not the parameter actually lives\non the stack. Therefore, we say global is a sub-mode of local\nand allow global values to be used at the local mode.\n\nlet localize x = exclave x\n\n\n\n  val localize : 'a -&gt; 'a @ local\n\n  \n\n\nIt is safe for a local value to reference a global, but not vice versa.\nAt runtime, this means we can create pointers from the stack to the\nheap, but not from the heap to the stack.  For example, we can create a\nlocal, fully stack-allocated list whose nodes refer to heap-allocated\nvalues.\n\nlet rec localize list = exclave\n    match list with\n    | Empty -&gt; Empty\n    | More (head, remaining) -&gt; More (head, localize remaining)\n;;\n\n\n\n  val localize : 'a list -&gt; 'a list @ local\n\n  \n\n\n\n\n\n\n\n\nWe could also create a local list where only the first node is\nstack-allocated—say, if we locally append to a global list.\n\nlet local_cons (local head) remaining = exclave\n    More (head, remaining)\n;;\n\n\n\n  val local_cons : 'a @ local -&gt; 'a list -&gt; 'a list @ local\n\n  \n\n\n\n\n\n\n\n\nWhat we cannot create is a global list containing a stack-allocated\nnode. Again, modes are deep, so any global list must have only captured\nglobals. This preserves the invariant that whenever a node is\nheap-allocated, all nodes reachable from it are also heap-allocated.\n\nMore rigorously, we could say that as we traverse a value, the current\nmode monotonically increases with depth. This restriction should also\nmake intuitive sense, since any list without this property contains a\npointer from the heap to the stack. Such a pointer is potential\nuse-after-free bug—the heap node may still be reachable after the\nstack frame has been freed.\n\n\n\n\n\n\n\nThe above layout can be represented using Rust lifetimes, which support\nsubtyping.  However, safely manipulating such data structures requires\nsignificantly more reasoning on the programmer’s part. Lifetime\nvariables refer to arbitrary regions, so subtyping relationships must be\nexplicitly specified.  Locality offers a compromise: considering just\none lifetime—the current region—makes efficient stack\nallocation easy to use in many practical scenarios. Values with other\nlifetimes are still managed by the garbage collector.\n\nGlobal Record Fields\n\nBecause modes are deep, a local record always contains local values.\nSuch fields may be stack allocated, so must be prohibited from escaping\nthe current region.  However, since global is a sub-mode of local, inner\nvalues may also be heap allocated—and sometimes the programmer\nknows they always will be.  In this case, locality is unnecessarily\nrestrictive.\n\nTherefore, we also support annotating record fields with an explicit\nglobal mode. The compiler forbids initializing a global field using a\nlocal variable.  Global fields are hence allowed to escape their region:\n\ntype 'a box = { global x : 'a }\n\nlet unwrap (local box) =\n    box.x\n;;\n\n\n\n  val unwrap : 'a box @ local -&gt; 'a\n\n  \n\n\nExplicitly mutable record fields are automatically considered global. If\nthis were not the case, a function could leak a local variable by\nstoring it within a local parameter, violating region safety. For\nexample:\n\ntype 'a box = { mutable x : 'a option }\n\nlet clear (local box) =\n    let local y = None in\n    box.x &lt;- y\n;;\n\n\n\n5 | box.x &lt;- y\n             ^\nError: this value escapes its region.\n\n\nLocality in Practice\n\nAt Jane Street, we’ve been using locality in production for some time.\nDevelopers who work on performance sensitive systems use locals daily,\nand those who don’t are largely unfamiliar with the feature—which\nmeans we’ve successfully limited the costs to the users who care.\nTherefore, we consider locality’s expressivity and performance benefits\nwell worth the additional language complexity.\n\nBuilding on locality’s success, the compilers team is now implementing\nadditional modes for describing ownership constraints. In part two, we\nwill explore new mode axes representing uniqueness and linearity.\n",
        "url"      : "https://blog.janestreet.com/oxidizing-ocaml-locality/",
        "image"    : "https://blog.janestreet.com/oxidizing-ocaml-locality/oxidizing-ocaml-locality.png",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Building reproducible Python environments with XARs",
        "date"     : "April 14, 2023",
        "authorId" : "psenchanka",
        "author"   : "Pavel Senchanka",
        "tags"     : [],
        "minsToRead" : 10,
        "content"  : "Our traders and researchers love Python for its agility and for its huge\nopen-source ecosystem, especially when it comes to machine learning. But\nthe heavy use of notebooks can make it difficult to support. Notebooks\nhave a very different lifecycle than regular code, and aren’t always\nrigorously version controlled. And while most of our code (much of it\nwritten in OCaml) lives in a monorepo, putting all notebooks there is\ndifficult; many notebooks end up being stored all over the place.\n\nThat leads to a serious problem: notebooks living outside of the\nrepository depend on various libraries inside of it. We have to expose\nthose libraries somehow and manage their release and deployment cycle.\nThis is where Python environments come in: they are the layer between\nlibraries and notebooks. In an ideal world, they’d allow library authors\nto iterate quickly while still allowing notebooks to be upgraded at\ntheir own pace.\n\nOver the last few years we have become more convinced that declarative,\nreproducible, centrally built Python environments are where we want to\nbe. Unfortunately, a lot of open-source Python environment tools were\ndesigned for smaller, mutable environments. There have been some\nimprovements lately, notably Pipenv and Poetry for reproducibility and\nDocker for deployment. Unfortunately, we can’t use these tools off the\nshelf: our Python environments need to interact with OCaml code during\nthe build process; and Docker requires more privileges on the hosts it\nruns on than we’re comfortable allowing.\n\nSo instead we’ve developed our own system for building and deploying\nPython environments. This tool, which we call js-python, relies on our\ninternal build system for OCaml integration and uses a format called XAR\nfor deployment. We think it works pretty well: it allows us to easily\ncreate and manage new environments, which in turn allows more\ndecentralization and improves robustness. Let’s take a look at how we\narrived here, and what we like about it.\n\nHow we started with Conda and discovered our problems\n\nWe deployed our first widely shared Conda environment in 2018. Conda is\ngreat. It’s open-source, has lots of packages and online documentation.\nOur environment was working reasonably well and people started to use it\nmore. And they started to request more and more packages.\n\nPretty soon two major problems became very apparent. The first has to do\nwith how the environment is stored on the filesystem: it is just a\ndirectory with a bunch of files. This feels very normal in much of the\nworld, but some internal context complicates things. Because the\nenvironment was shared, it was deployed on a network filesystem. NFS is\na very powerful piece of technology, but our NFS server becomes slow\nwhen tens of thousands of files are read in a tight loop. Starting\nPython and running the usual library imports actually triggers this\nsituation. Starting a kernel from the local disk took less than one\nsecond, but starting the same kernel from NFS took on the order of\nminutes if you included all imports.\n\nSecond: the environment was not reproducible. We had a wiki page that\nhad instructions and records about how it was built, but it wasn’t\ncomplete and didn’t account for differences in machines and conditions\nduring the build. At some point people became cautious about installing\npackages because they were afraid they would break the environment in\nsubtle ways and wouldn’t be able to go back!\n\nSwitching to XAR files for fast startup and relocatability\n\nThe first major change came when we discovered the\nXAR\nfile format. At a very high level, a XAR file contains an entire\nfilesystem packed together into one file. It can be mounted and appear\nto the user as the usual tree of files. XARs use efficient compression\n(Zstd), and are pretty easy to work with. They are built on top of\nsquashfs,\nwhich has been used in the Linux world for a long time and has robust\ntooling.\n\nXARs seemed like a perfect fit for our NFS deployments: from the NFS\nperspective they are just a single file, so the reads are fast; but from\nthe Python perspective once the XAR is mounted it’s not that different\nfrom a normal Python environment.\n\nHowever, the reproducibility problem remained. It was difficult to write\na script that built an exact copy of our shared environment, and\nespecially one that could run in an isolated build system the way we\nwanted it to. Also, Conda relies on a central configuration and a\nstandard directory structure, which makes it somewhat tricky to build a\nConda environment in one place and then move it to another. These\nreasons made it difficult to integrate Conda with XARs, which are\ndesigned to be relocatable.\n\nFaced with these challenges, we decided to drop the idea of packing\nConda environments into XARs. Instead we’d try to take more control over\nhow the environment was structured and initialized. We wanted to keep\nthe structure simple so it would be easy to maintain and integrate with\nother tools. In the end we set up something similar to virtualenv using\nour XAR entrypoint script to configure environment isolation.\n\nIt worked, and it was fast! In the spirit of naming things boring names\nwe called the resulting environment format js-python. But it still\nwasn’t perfect.\n\nWhere XARs leak: dynamic components\n\nWe could build and deploy XARs pretty easily, and we were eager to move\nthe build process into our continuous integration system so we could run\ntests and catch bugs early. This was not something possible with our\nConda setup, but seemed doable with XARs.\n\nIn practice, though, we quickly ran into an issue: our XARs were not\nfully self-contained. Because environments were historically hard to\ndeploy, our library code was developed and released separately as two\nmajor components: pure Python libraries and Python bindings for OCaml\nlibraries. The environment itself only contained the interpreter and\nthird-party libraries. This made testing and releasing changes that\nspanned more than one component—like adding a third-party library to\nthe environment and using it in internal code—quite painful.\n\nFor the same reason, it was harder to troubleshoot production issues:\nlooking at the deployment history of any one component to find the cause\nof a breakage, you could potentially miss deployments or rollbacks of\nthe other two components. Rolling back during production issues was not\nfun.\n\nSo we did get some benefits of XARs, but because of the way our XARs\nwere structured, we didn’t get all of them.\n\nStatic bundling: making XARs actually self-contained\n\nIf we zoom out, the problem we were facing is the classic one: static or\ndynamic linking. Jane Street libraries were essentially dynamically\nlinked into the environment, with the pros and cons that came with it.\n\nOutside of Python, we normally build statically linked executables, and\nthat has a lot of advantages. Library owners can expect to be able to\nmake and release changes without risking breaking production jobs, and\noften with less consideration for versioning. The monorepo encourages\nthis as well: it’s harder to take full advantage of refactoring\ncapabilities that a monorepo provides if you have to care about your\nlibraries being dynamically linked into old executables.\n\nPython did not have to follow the same model. But if it did, testing and\ndeploying changes that span Python, OCaml, and external libraries would\nbe much easier. This is very attractive: there are hundreds of OCaml\nlibraries at Jane Street that could be useful in Python, and the lower\nthe barrier for exposing them, the more power Python users would have.\n\nSo we went ahead and converted the internal library and OCaml bindings\nto be statically linked into the environment! This resulted in a set of\nnew Python-specific build rules in our build system that allow you to\nspecify, at build time, how Python libraries depend on each other and on\nOCaml libraries.\n\nVersioning deployed environments: XARs need names\n\nNow that we could build and deploy environments continuously, one of the\nfirst user concerns was: how do I make sure my notebooks don’t break\nunexpectedly?\n\nHistorically our environments were used through their “nightly” instance.\nStatic linking means that when we don’t deploy “nightly”, the environment\ndoesn’t change (which is a good property!). But we do want to deploy\n“nightly” continuously, to provide new features and fixes to users\nquickly.\n\nTo accommodate both the users who want the latest changes, and those\nthat prefer stability, we introduced a distinction between “nightly” and\n“stable” tags. stable tags are created on a schedule, and once created\nthey never change. Both nightly and stable tags are exposed as notebook\nkernels, so users can easily switch between them when they want to\n“time-travel” between environment versions.\n\nA single-revision world\n\nAll of the above has led us to a state where a Python environment is\nexactly described by the revision of the code it was built from. Users\ncan switch between revisions (usually by means of stable tags), and\nmaintainers have full control over a deployed instance like\n“research/nightly”: when you roll it, you roll “everything”, and when\nthings break, you can roll back “everything”.\n\nWe're really happy with the results. It makes environments easier to\nmaintain without having to worry about interactions and varying roll-out\nschedules between third-party code and internal Python and OCaml code.\n\nThat makes it easier to mint new environments, which itself makes\nworkflows more isolated and makes it easier to distribute work across\nteams. In the new world, a critical notebook in Hong Kong might use a\ndedicated local environment, which means it won't get broken by a roll\nhappening out of New York. Ultimately, centralizing the build allows us\nto decentralize the environments more, which increases the overall\nresiliency of our Python code.\n\nIt comes at a cost though. Hotfixes are now much less trivial to make:\nyou cannot edit a XAR file in vim when you find a bug. At the very least\nyou have to build a new XAR file with the fix and deploy it. Even if you\nskip most of the normal pipeline, and you are familiar with the tools,\nit can still take 10 to 20 minutes. We’re working on bringing this time\ndown, but some traders consider this fundamentally too slow for running\ncritical applications. If you use Python code to provide knobs to your\nsystem, you expect to be able to modify it on the fly, not wait 20\nminutes. Python is a pretty good “rich config” format, so we don’t want\nto take away this possibility.\n\nThe way we think about this currently is that Python code is split into\n“core” and “leaf”. “Core” code is part of the environment: it has clear\nseparation between development and production phases. After it’s\ndeployed, it’s immutable. “Leaf” code exists outside of the environment.\nDifferent teams have different setups for it, but usually it involves\nsome lighter-weight version control than our main monorepo, and the\nability to edit the code in the deployment location. Leaf code is\nusually notebooks or thin scripts which call into core code.\n\nThe simplicity of the single-revision model is very attractive, and we\ntry to encourage users to put as much code as possible into the\nenvironment. But we don’t want to take away that last-mile agility that\nmany teams rely on, and the core/leaf distinction lets us strike a good\ncompromise.\n",
        "url"      : "https://blog.janestreet.com/building-reproducible-python-environments-with-xars/",
        "image"    : "https://blog.janestreet.com/building-reproducible-python-environments-with-xars/pycon.png",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "What if writing tests was a joyful experience?",
        "date"     : "January 9, 2023",
        "authorId" : "jsomers",
        "author"   : "James Somers",
        "tags"     : [],
        "minsToRead" : 17,
        "content"  : "At Jane Street we use a pattern/library called “expect tests” that\nmakes test-writing feel like a REPL session, or like exploratory\nprogramming in a Jupyter notebook—with feedback cycles so fast and\njoyful that it feels almost tactile. Having used them for some time now\nthis is the only way I’d ever want to write tests.\n\nOther languages call these “snapshot” tests—see for example Rust’s\n  expect-test, which seems to have been inspired\n  by our library, or Javascript’s Jest. We\n  were first put onto the idea ourselves by Mercurial’s unified\n  testing format, and so-called “cram”\n  tests, for testing shell sessions.\n\nIn most testing frameworks I’ve used, even the simplest assertions\nrequire a surprising amount of toil. Suppose you’re writing a test for\na fibonacci function. You start writing assert fibonacci(15) ==\n... and already you’re forced to think. What does fibonacci(15)\nequal? If you already know, terrific—but what are you meant to do\nif you don’t?\n\nI think you’re supposed to write some nonsense, like assert\nfibonacci(15) == 8, then when the test says “WRONG! Expected 8, got\n610”, you’re supposed to copy and paste the 610 from your terminal\nbuffer into your editor.\n\nThis is insane!\n\nHere’s how you’d do it with an expect test:\n\nprintf \"%d\" (fibonacci 15);\n[%expect {||}]\n\n\n\nThe %expect block starts out blank precisely because you don’t know\nwhat to expect. You let the computer figure it out for you. In our\nsetup, you don’t just get a build failure telling you that you want\n610 instead of a blank string. You get a diff showing you the exact\nchange you’d need to make to your file to make this test pass; and\nwith a keybinding you can “accept” that diff. The Emacs buffer you’re\nin will literally be overwritten in place with the new contents [1]:\n\n\n  \n\n\nIt’s hard to overstate how powerful this workflow is. To “write a\ntest” you just drop an [%expect] block below some code and it will\nget filled in with whatever that code prints.\n\nJust the other day I was writing a tricky little function that rounds\nnumbers under an unusual set of constraints; it was exactly the kind\nof thing you’d want to write in a REPL or Jupyter notebook, to iterate\nquickly against lots of examples. All I had to do was write the\nfollowing right below my function:\n\nlet%expect_test \"Test the [round] function on [examples]\" =\n  Ascii_table.simple_list_table\n    [ \"n\"; \"f(n)\" ]\n    (List.map examples ~f:(fun n -&gt; [ n; round n ] |&gt; List.map ~f:string_of_float));\n  [%expect {||}]\n\n\n\nand voila my editor produced a little table of results. Naturally my\nfirst implementation had all kinds of bugs—some entries in the table\nlooked wrong. Improving the function became a matter of fiddling,\nobserving the diffs that produced, fiddling some more, and so on,\nuntil the table finally looked the way I liked. (Had I wanted, I could\nhave at that point used something like Quickcheck to do\nexhaustive fuzz testing.) The table meantime lived on as\ndocumentation—indeed for many functions, seeing a handful of example\ninputs and outputs is a lot clearer than a prose description.\n\nOf course, the table is not just an exploratory aid and a bit of\ndocumentation but also, you know, a test. If someone ever tweaks my\nfunction or any of its dependencies, the frozen output in the\n[%expect] block guards against unexpected behavior. In expect tests,\nregressions are just diffs.\n\n(In general, although it’s possible to inline tests right where the\ncode is written, at Jane Street we tend to clearly separate test and\nreal code. Tests live in their own directory and are written against\nthe public interface, or, when testing private implementations,\nagainst a For_testing module exported just for that purpose.)\n\nWhat’s wrong with regular old unit testing?\n\nBack when I worked at a Ruby web dev shop we used to write a lot of\ntests like the following, taken from a blog post about RSpec,\na popular Ruby testing framework:\n\nbefore do\n  @book = Book.new(:title =&gt; \"RSpec Intro\", :price =&gt; 20)\n  @customer = Customer.new\n  @order = Order.new(@customer, @book)\n\n  @order.submit\nend\n\ndescribe \"customer\" do\n  it \"puts the ordered book in customer's order history\" do\n    expect(@customer.orders).to include(@order)\n    expect(@customer.ordered_books).to include(@book)\n  end\nend\n\ndescribe \"order\" do\n  it \"is marked as complete\" do\n    expect(@order).to be_complete\n  end\n\n  it \"is not yet shipped\" do\n    expect(@order).not_to be_shipped\n  end\nend\n\n\n\nThis is a perfectly lovely test. But think: everything in those\ndescribe blocks had to be written by hand. The programmer first had\nto decide what properties they cared about—(customer.orders,\ncustomer.ordered_books, order.complete, order.shipped)—then\nalso had to say explicitly what state they expected each field to be\nin. Then they had to type it all out.\n\nMy main claim is that all that deciding and typing is painful enough\nthat it actually discourages you from writing tests. Tests become a\nbummer instead of a multi-tool that helps you:\n\n\n  visualize behavior as you hack on an implementation\n  express and document intent\n  freeze a carefully crafted version of that output to protect against\nregressions\n\n\nIf RSpec had expect tests one could have simply written:\n\nexpect_test \"#submit\" do\n  @book = Book.new(:title =&gt; \"RSpec Intro\", :price =&gt; 20)\n  @customer = Customer.new\n  @order = Order.new(@customer, @book)\n\n  @order.submit\n  p @customer.orders\n  p @order\n  expect \"\"\nend\n\n\n\nand all the same state would have been made visible.\n\nAren’t lazy tests bad tests?\n\nI hear you already: tests should be explicit. You want to define\nup front the properties you care about, the output you’re expecting,\nand so on. (Especially in TDD.)  You don’t want to just dump a bunch\nof state and leave it to the reader to sort out what’s going on. And\nyou don’t want to have to wait for your function to be written to be\nable to write tests for it.\n\nYou’re right! But expect tests can be just as targeted as a classical\nunit test. I can always print out order.shipped? and type the string\n\"false\" in my expect block. I can do this before I’ve written any\ncode and I’ll get the same sorts of errors as someone doing TDD with\nRSpec.\n\nThe difference is that I don’t have to do that. Or I can defer doing\nthat until after I’ve done the fast-and-loose thing of “just seeing\nwhat happens.” That’s the beauty of a blank expect block: it is an\ninvitation to the runtime to tell you what it’s thinking.\n\nOf course, one of the downsides of just dumping state without doing\nany filtering is that you can get lost in a bunch of irrelevant\ndetails, and it’s harder for the reader to know what’s important, both\nwhen they read the test the first time, and when a code change causes\nthe test output to change. It also makes it more likely that you’ll\npick up spurious changes.\n\nThus the art of expect tests is in producing output that tells a\nconcise story, capturing the state you care about. The best tests\ntake pains to elide unnecessary detail. Usually they use helper\nfunctions and custom pretty-printers to craft the output.\n\nWhen expect tests were first adopted at Jane Street, they spread like\nwildfire. Now they form the better part of our test suite,\ncomplemented in places by property-based\ntesting. Classical assertion-style unit tests still have\ntheir place—just a much smaller one.\n\nSome real expect tests\n\nThe tedium of writing your expected output by hand only grows with the\ncomplexity of your actual system. A table of numbers is one\nthing—imagine trying to describe the state of the DOM in a web\napplication or the state of an order book in a financial exchange.\n\nWeb UI tests\n\nHere’s an excerpt of a real test from a toy web app built using\nBonsai, Jane Street’s open-source web framework for OCaml. (Think\nReact or Elm.) One of Bonsai’s most powerful features is its ability\nto let you easily write realistic tests, in which you programatically\nmanipulate UI elements and watch your DOM evolve.\n\nIn this example, we’re testing the behavior of a\nuser-selector. Whatever you type in the text box gets appended to a\nlittle “hello” message:\n\nlet%expect_test \"shows hello to a specified user\" =\n  let handle = Handle.create (Result_spec.vdom Fn.id) hello_textbox in\n  Handle.show handle;\n  [%expect\n    {|\n    &lt;div&gt;\n      &lt;input oninput&gt; &lt;/input&gt;\n      &lt;span&gt; hello  &lt;/span&gt;\n    &lt;/div&gt; |}];\n  Handle.input_text handle ~get_vdom:Fn.id ~selector:\"input\" ~text:\"Bob\";\n  Handle.show_diff handle;\n  [%expect\n    {|\n      &lt;div&gt;\n        &lt;input oninput&gt; &lt;/input&gt;\n-      &lt;span&gt; hello  &lt;/span&gt;\n+      &lt;span&gt; hello Bob &lt;/span&gt;\n      &lt;/div&gt; |}];\n\n\n\nNotice that there are two expect blocks. (This allows you to make\nmultiple assertions within a given scenario and to scope setup/helper\ncode to just that scenario.)\n\nThe first makes our UI visible, and the second—which contains a\ndiff—shows some behavior after you programatically input some\ntext. Bonsai will even show you how html attributes or class names\nchange in response to user input. Tests can include mock server calls,\nand can involve changes not just to the UI but to the state that\ndrives it. With tests like these you can write an entire component\nwithout opening your browser.\n\nTests of low-level system operations\n\nOur popular magic-trace tool, which uses Intel\nProcessor Trace to collect and display high-resolution traces of a\nprogram’s execution, makes heavy use of expect tests. Some are simple,\nfor example this one that tests the program’s symbol demangler:\n\nlet demangle_symbol_test symbol =\n  let demangle_symbol = Demangle_ocaml_symbols.demangle symbol in\n  print_s [%sexp (demangle_symbol : string option)]\n;;\n\nlet%expect_test \"real mangled symbol\" =\n  demangle_symbol_test \"camlAsync_unix__Unix_syscalls__to_string_57255\";\n  [%expect {| (Async_unix.Unix_syscalls.to_string) |}]\n;;\n\nlet%expect_test \"proper hexcode\" =\n  demangle_symbol_test \"caml$3f\";\n  [%expect {| (?) |}]\n;;\n\nlet%expect_test \"when the symbol is not a demangled ocaml symbol\" =\n  demangle_symbol_test \"dr__$3e$21_358\";\n  [%expect {| () |}]\n;;\n\n\n\nOthers serve as a kind of stable documentation, giving visibility into\nthe guts of the running system—like this test that demonstrates\nwhat a trace of an OCaml exception will actually look like (shortened\nfor clarity):\n\nlet%expect_test \"A raise_notrace OCaml exception\" =\n  let ocaml_exception_info =\n    Magic_trace_core.Ocaml_exception_info.create\n      ~entertraps:[| 0x411030L |]\n      ~pushtraps:[| 0x41100bL |]\n      ~poptraps:[| 0x411026L |]\n  in\n  let%map () =\n    Perf_script.run ~ocaml_exception_info ~trace_scope:Userspace \"ocaml_exceptions.perf\"\n  in\n  [%expect\n    {|\n    23860/23860 426567.068172167:                            1   branches:uH:   call                           411021 camlRaise_test__entry+0x71 (foo.so) =&gt;           410f70 camlRaise_test__raise_after_265+0x0 (foo.so)\n    -&gt;      3ns BEGIN camlRaise_test__raise_after_265\n    -&gt;      6ns BEGIN camlRaise_test__raise_after_265\n    -&gt;      9ns BEGIN camlRaise_test__raise_after_265\n    -&gt;     13ns BEGIN camlRaise_test__raise_after_265\n    -&gt;     13ns BEGIN camlRaise_test__raise_after_265\n    -&gt;     13ns BEGIN camlRaise_test__raise_after_265\n    -&gt;     13ns BEGIN camlRaise_test__raise_after_265\n    -&gt;     14ns BEGIN camlRaise_test__raise_after_265\n    ...\n   |}%]\n\n\n\nState machine tests\n\nHere’s a test from a toy system at Jane Street that processes\nmarketdata. (We use this system as part of one of our “dev teach-ins,”\ntwo-week internal classes put on for developers meant to expose them\nto different systems, libraries, ideas, and idioms from around the\nfirm: e.g. Advanced functional programming or Performance\nengineering.) The goal of this particular test is to show how the\nstate of a two-sided order book with “buys” and “sells” responds to an\nincoming order.\n\nTo write the test, all you have to do is set up the situation, then\ndrop a blank [%expect] block:\n\nlet d = create_marketdata_processor () in\n(* Do some preprocessing to define the symbol with id=1 as \"APPL\" *)\nprocess_next_event_in_queue d\n  {|\n((timestamp (2019-05-03 12:00:00-04:00))\n (payload (Add_order (\n     (symbol_id 1)\n     (order_id  1)\n     (dir       Buy)\n     (price     10.00)\n     (size      1)\n     (is_active true)))))\n|};\n+ [%expect {||}];\n\n\n\nThe compiler then figures out what should go inside the block. You’ll\nfind that you get a build error telling you that it’s not supposed to\nbe blank. Accepting the proposed diff, you end up with a block like\nthis:\n\n[%expect {|\nprocess_next_event_in_queue d\n  {|\n((timestamp (2019-05-03 12:00:00-04:00))\n (payload (Add_order (\n     (symbol_id 1)\n     (order_id  1)\n     (dir       Buy)\n     (price     10.00)\n     (size      1)\n     (is_active true)))))\n|};\n[%expect {|\n+ ((book_event\n+     (Order_added ((order_id 1) (dir Buy) (price 10.0000000) (size 1))))\n+    (book\n+     ((instrument_name AAPL)\n+      (book ((buy (((price 10.0000000) (orders ((1 1)))))) (sell ()))))))\n|}];\n\n\n\nThis is beautiful: a plain-text representation of the state of your\nsystem. The expect block shows you the order book. By keeping the\norder book small and simple, you ensure the test is legible. But you\ndon’t need to make any specific assertions about it.\n\nCompare what you might write for that last block in RSpec-land:\n\nexpect @book[\"AAPL\"].sell to_be empty\nexpect @book[\"AAPL\"].buy[0].price to_equal 10\nexpect @book_events to.include(@order)\n\n\n\nExplicitly checking every aspect of the entire state of the order book\nwould be too tedious, so instead, you write a handful of what you\nthink are the most important assertions. This takes thinking, typing,\nand time.\n\nIt also leaves you vulnerable later, when someone borks the\nimplementation of the order engine. Let’s say that now it mangles the\nsize of orders as it adds them to the book. Whereas the handcrafted\nassertions above will continue to pass—you never said anything\nabout the size of the order on the book—the expect test will fail\nwith a nice little diff showing you that size 1 inadvertently became\nsize 100.\n\nOf course it is not always true that expect tests catch more than\nregular unit tests—you have exactly the same level of flexibility\nin each—but by relieving you from having to dream up exactly what\nyou want to assert, expect tests make it easier to implicitly assert\nmore. Ironically, they capture things you never expected them to.\n\nThe pleasure of plain text\n\nThis style of testing encourages you to make printing itself easy,\nbecause most tests involve little more than setting up some data and\nprinting it. And indeed at Jane Street, we use code generators (like\nppx_sexp_conv) that make it trivial to create a stringified\nrepresentation of just about any type. (You’ll have noticed above that\nwe lean heavily on S-expressions.)\n\nPeople find expect tests so convenient that they’ll sometimes go to\ngreat lengths to create helpers for producing plain text output, even\nin places where you might not expect it. For instance in\nHardcaml, an open-source DSL for writing FPGA simulations\nthat Jane Street now maintains, many of the tests feature square\nplain-text\nwaveforms\nthat show you exactly what e.g. your clock and clear lines are doing:\n\nlet%expect_test \"counter\" =\n  let waves = testbench ()\n  Waveform.print ~display_height:12 waves\n  [%expect {|\n+ ┌Signals────────┐┌Waves──────────────────────────────────────────────┐\n+ │clock          ││┌───┐   ┌───┐   ┌───┐   ┌───┐   ┌───┐   ┌───┐   ┌──│\n+ │               ││    └───┘   └───┘   └───┘   └───┘   └───┘   └───┘  │\n+ │clear          ││                        ┌───────┐                  │\n+ │               ││────────────────────────┘       └───────────────   │\n+ │incr           ││        ┌───────────────┐                          │\n+ │               ││────────┘               └───────────────────────   │\n+ │               ││────────────────┬───────┬───────┬───────────────   │\n+ │dout           ││ 00             │01     │02     │00                │\n+ │               ││────────────────┴───────┴───────┴───────────────   │\n+ │               ││                                                   │\n+ └───────────────┘└───────────────────────────────────────────────────┘\n  |}]\n\n\n\nToward better tests for all\n\nI hope this post encourages more people to try the “snapshot” style of\ntesting. My own experience with it is that I never want to go back to\na workflow where my computer isn’t finishing my tests for me. If\nnothing else, an editor integration that can take an expected result\nand put it in its proper place in an assertion goes a long way. Typing\nthose assertions by hand feels somewhat like fixing the formatting of\nsource code by hand: something I was perfectly content doing for years\nuntil a tool came along that made the previous practice seem faintly\nridiculous.\n\nFrom the looks of it, this idiom—which again we didn’t invent; we\nborrowed it from Mercurial, though I’m not sure if that’s the ur\nsource or if it goes further back—seems to be catching on more\nwidely. Maybe someday it’ll go truly mainstream.\n\nFootnotes\n\n[1] We used to call these things\nquine tests because in effect\nyou’re dealing with a program that knows how to print its own\nsource.\n",
        "url"      : "https://blog.janestreet.com/the-joy-of-expect-tests/",
        "image"    : "https://blog.janestreet.com/the-joy-of-expect-tests/expect.gif",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Accelerating zk-SNARKs - MSM and NTT algorithms on FPGAs with Hardcaml",
        "date"     : "December 7, 2022",
        "authorId" : ["aray","bdevlin","fquah","rayesantharao"],
        "author"   : null,
        "tags"     : [],
        "minsToRead" : 9,
        "content"  : "In 2022 a consortium of companies ran an international competition,\ncalled the ZPrize, to advance the state of\nthe art in “zero-knowledge” cryptography. We decided to have a go in\nour free time at submitting solutions to both the Multi-Scalar\nMultiplication (MSM) and Number Theoretic Transform (NTT) tracks,\nusing the same open source Hardcaml libraries\nthat Jane Street uses for our own FPGA development. We believe by\nusing Hardcaml we were able to more efficiently and robustly come up\nwith designs in the short competition period. These designs also\ninteract with the standard vendor RTL flow and so we hope they will be\nuseful to others.\n\nOur MSM solution, implemented on the BLS12-377 curve, beats all\ncurrent FPGA state of the art, including the recently released\nCyclone MSM and\nPipeMSM. It’s able to calculate 4\nrounds of 226 MSMs 20.331s, an average of 5.083s per MSM,\nand won first place in the ZPrize MSM track. Our\npower-area-performance balanced NTT solution took second place in the\nZPrize NTT track. Full results are available\nhere.\n\nOur results and a bit of background on the competition are summarized\nbelow. For more detailed reading, we have created a Hardcaml ZPrize\nwebsite.\n\nZero-knowledge proofs\n\nZero-knowledge proofs\n(ZKPs) are\npowerful cryptography tools that allow a prover to prove that a\ncertain statement is true without revealing any other information to\nthe verifier. For example for a given function F and publicly known\nx, a prover can show F(x,w) = y, without revealing w to the\nverifier.\n\nThis property means ZKPs are attractive in contexts where online\nprivacy is paramount, for instance anonymous voting. They also form\nthe backbone of certain blockchain features\n(zk-Rollups)\nand “Web3” applications, and\nhave been becoming increasingly popular in recent years.\n\nOne class of ZKP getting attention lately is called “Zero-knowledge\nSuccinct Non-interactive Arguments of Knowledge” (zk-SNARKs). These\nrequire no interaction between the prover and verifier, and are\ncompact and quick to verify.\n\nThe problem is that while verifying is relatively fast, constructing\nthe proof of a zk-SNARK can be quite time consuming—minutes or even\nhours when there is a large number of constraints. Most of this time\nis spent in the calculation of NTTs and MSMs (roughly 30% and 70%\nrespectively). Current systems that use zk-SNARKs tend to have\nconstraint sizes in the millions.\n\nBefore describing our specific solutions, we’ll briefly introduce the\nunderlying cryptographic primitives.\n\nElliptic curve cryptography\n\nElliptic curve cryptography (ECC) allows for smaller keys compared to\nnon-EC cryptography such as RSA (modular exponentiation based on\nplain Galois fields), to provide the same level of security. For\nexample a small 228 bit ECC key requires as much time to crack as a\nmuch larger 2,380 bit RSA key. A smaller key here means cryptographic\nsystems and the data transfer involved can be much more compact and\nfaster.\n\nAll ECC calculations take place within Fp, a finite field\nof integers modulus a large prime p, and in particular are performed\nover cyclic groups which have a generator g (all elements in the\ngroup can be generated from this point).\n\nBasic operations on an elliptic curve are point addition and point\ndoubling, which are used repeatedly in the MSM algorithm. In order to\nimplement point operations, we are performing multiplications and\nadditions modulo a prime. These operations can further be optimized\nwith efficient modulo reduction algorithms such as Barrett or\nMontgomery, and better-than-O(n2) multiplication\ntechniques such as the Karatsuba algorithm.\n\nzk-SNARKs make use of ECC primitives, through several well-known\nprover algorithms such as\nGroth16. But the problem is\nthe more complex zk-SNARK we want to implement, the more constraints\nwe need—which translates directly into very large polynomials.\n\nIt’s the operations on these large polynomials (order 226\nand up) that require fast MSM and NTT solutions. NTTs are used to\nachieve an O(NlogN) rather than O(n2) time complexity\nin polynomial multiplication; MSMs are needed for the exponentiation\nof the elliptic curve points, described more in the next section.\n\nThe diagram below shows that in order to accelerate zk-SNARKs, we need\nto focus on both MSM and NTT problems, which in turn require novel\noptimizations in their ECC primitives—a full-stack solution.\n\n\n\n\nzk-SNARKs\n \n\nOur solutions to the MSM and NTT problems\n\nMSM\n\nIn general, the MSM problem is to take a list of scalars and points\nand compute the sum of each of the points scaled by its corresponding\nscalar. For the MSM prize track, we were tasked with performing the\nMSM computation over a fixed set of 226 elliptic curve\npoints from the BLS 12-377 G1\ncurve and a randomly sampled\nset of scalars from the corresponding scalar field. (Because the\npoints are fixed, this is sometimes referred to as a “fixed-base\nMSM”.)\n\nTo solve this problem, we implemented a heavily optimized version of\nPippenger’s algorithm (explained\nhere).\n\nWe implemented a solution that uses the FPGA to do the vast majority\nof the computational work, while the host does a much smaller set of\ncomputations to obtain the final result. By splitting the work between\nthe x86 host and the FPGA, we were able to focus on optimizing the\nalgorithm for implementation on an FPGA while leaving seldom-seen\ncorner cases to the host. One challenge we had to overcome with\nfragmenting the computation like this was how to architect system\ncommunication to allow the FPGA to start streaming and processing the\nnext round of calculations, while in parallel having the host finish\nup the previous batch.\n\nThe core of our design is a fully pipelined, optimized elliptic curve\npoint adder. The majority of the work in Pippenger’s algorithm\nconsists of adding the points into buckets based on their\ncoefficients. By performing all of these bucket accumulations on the\nFPGA, we accelerate the entire algorithm. However, because elliptic\ncurve point addition is a complex operation, our pipelined adder has\nover 200 pipeline stages! Flushing the pipeline every time we hit a\ndata hazard while adding points into the same bucket would destroy our\nthroughput, so we also implement a controller which coordinates\nstalling and adding points into buckets to avoid data hazards. The\nFPGA then streams these partial bucket sums back to the host, where\nthey are combined to produce the final result.\n\n\n\nOverview of our MSM solution\n \n\nWe made a number of optimizations, both algorithmic and\nengineering-wise, across the full stack of the solution, including\nfinite field operations, elliptic curve arithmetic, and Pippenger’s\nalgorithm. For a detailed discussion of our techniques and\noptimizations, see our\nwebsite.\n\nResults\n\nWe implemented our MSM solution on an AWS f1.2xlarge instance which\ncosts $1.65 per hour, over various input sizes up to 226\ninputs. The FPGA card uses a V9P UltraScale+ Xilinx chip. We report\nboth the total latency for 4 rounds of MSM, as well as the latency for\na single round, over several different MSM sizes. “Masked 1 round\nlatency” is the average of 4 rounds, taking into account our\noptimizations that allow host and FPGA work to be parallelized.\n“Unmasked 1 round latency” means we explicitly calculate one round,\nwhich shows the overhead of doing work on the host.\n\n\n  \n    \n      \n        Input size\n        4 round latency (s)\n        Masked 1 round latency (s)\n        Unmasked 1 round latency (s)\n      \n    \n    \n      \n        226\n        20.331\n        5.083\n        5.518\n      \n      \n        225\n        10.398\n        2.600\n        2.989\n      \n      \n        224\n        5.967\n        1.492\n        1.724\n      \n      \n        223\n        3.901\n        0.975\n        1.092\n      \n      \n        222\n        2.883\n        0.720\n        0.779\n      \n    \n  \n\n\nAverage running power used by the FPGA design is 52 watts regardless\nof input size. For detailed resource usage and even more detailed\nlatency breakdowns see our\nwebsite.\n\nNTT\n\nThe Number Theoretic Transform (NTT) is very similar to the Fast\nFourier Transform (FFT), except that it operates over a finite field\ninstead of complex numbers. NTTs replace the normal twiddle factors\nyou see in an FFT with a new finite primitive root of unity w, where w\nis the nth root of unity if wn = 1 modulo some\nlarge prime number. In this competition, we were required to use the\nSolinas prime\n264-232+1.\n\nOur NTT solution was written to target a C1100 FPGA accelerator card,\n and implements the 4-step\n algorithm over a\n 224 NTT.\n\nBy choosing this algorithm we were able to decompose the original NTT\ninto many smaller ones that can be run in parallel. The smaller NTTs,\neach of size 212, can easily fit in the available SRAM\nresources on the FPGA. These smaller transforms were implemented using\nthe well-known Cooley-Tukey\nFFT\nalgorithm adapted for a finite field.\n\nWe decided to implement 8 parallel cores as a cluster, as this matched\nour HBM bus width of 512 bits. Each core inputs and outputs 64 bits\nand has its own field multiplier and two adders. All the cores in a\ncluster share a single controller. We can then instantiate multiple\nclusters of cores over the FPGA to achieve high parallelism. The\ndiagram below highlights the blocks and data-flow between off-chip\nmemory (HBM) and on-chip SRAM.\n\n\n\n NTT core scaling\n \n\nWe have a much more detailed writeup of the algorithms used, plus\nsource code and instructions for building from scratch\nhere.\n\nResults\n\nWe experimented with different core counts on a single C1100 Varium\n card, with results listed below for a full 224 NTT:\n\n\n  \n    \n      \n        Cores\n        Latency (s)\n        Power (W)\n        LUTs\n        Registers\n        DSP\n        BRAM36\n        URAM\n      \n    \n    \n      \n        8\n        0.2315\n        16.97\n        107291\n        141006\n        260\n        162\n        48\n      \n      \n        16\n        0.1238\n        18.19\n        126422\n        156149\n        512\n        162\n        96\n      \n      \n        32\n        0.0691\n        21.13\n        166488\n        184436\n        1028\n        162\n        192\n      \n      \n        64\n        0.0450\n        27.70\n        265523\n        246385\n        2052\n        162\n        384\n      \n    \n  \n\n\nFuture work\n\nDue to time constraints, there were a number of optimizations that we\ndid not have a chance to experiment with. We discuss them at length on\nthe corresponding page of our\nwebsite.\n\n\n\n",
        "url"      : "https://blog.janestreet.com/zero-knowledge-fpgas-hardcaml/",
        "image"    : "https://blog.janestreet.com/zero-knowledge-fpgas-hardcaml/hardcaml-zprize.jpg",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Visualizing information propagation in markets",
        "date"     : "November 23, 2022",
        "authorId" : "richeng",
        "author"   : "Ricson Cheng",
        "tags"     : [],
        "minsToRead" : 5,
        "content"  : "The Dojima rice market, established around 1716, is widely considered to\nbe the world’s first organized futures exchange. Instead of directly\nexchanging money for rice on the spot, merchants would agree on a price\nand future date at which rice and money would be exchanged. This allowed\nfarmers and consumers to hedge their risk. As a result, information\nabout the abundance or lack of rice would travel across the country as\nfast as rice merchants carried it.\n\nIn Kumagae Onna Amigasa, author Bunryu Nishiki describes how a rice\nmerchant from Koriyama named Yomiji would convey these price signals\nfrom Dojima, Osaka to Koriyama in order to make profitable trades.\n\n\nIn order to get information about daily prices at the rice exchange,\nYomiji hired a regular express messenger \\... A minute after the\nrice exchange started the business of the day in Osaka, the\nmessenger wearing red hat and red gloves ran like a flying bird and\narrived at Kuragari Pass \\... If he raised his left hand by 1\ndegree, it meant the rice price increased by 1 bu of silver. If he\nraised his right hand by 1 degree, it meant the rice price decreased\nby 1 bu of silver. His role was to inform Yomiji of the increases\nand decreases of rice prices. Yomiji saw the person's signals from\nthe second floor of the wholesale store using a telescope which had\na range of 10 miles, and bought or sold rice taking these price\nchanges into consideration \\... As Yomiji knew the rice prices\nearlier than anyone when the information was delivered to Kuragari\nPass, there was not a single day that Yomiji did not make\nmoney. Other merchants had no idea about his scheme. They gave\nYomiji the nickname \"Forecasting Yomiji\" and rice prices in Koriyama\ncame to be greatly influenced by Yomiji's transactions. [0] [1]\n\n\nToday, price signals can travel much faster than anyone can run, but\nstill no faster than the light which entered Yomiji’s telescope.\nInformation propagates through fiber optic cables that span the oceans.\nUsing data from several futures exchanges around the world, we put\ntogether a visualization to show how this happens.\n\n*\n\nSome of the most heavily traded products in the world are equity index\nfutures. An equity index, like the S&P 500, represents a weighted basket\nof stocks. For example, the S&P 500 is representative of the entire U.S.\nstock market, whereas the FTSE 100 Index is representative of publicly\ntraded firms in the UK. An equity index future allows people to bet on,\nor hedge their exposure to, the price of these indices at a future point\nin time.\n\nLike rice futures, equity index futures trade in many geographic\nlocations. Futures contracts on the S&P 500 Index trade in Chicago.\nFutures on the FTSE 100 Index are traded just outside of London. Nikkei\n225 futures are traded in Tokyo, Hang Seng China Enterprises futures in\nHong Kong, ASX SPI 200 futures in Sydney, and Euro Stoxx 50 futures in\nFrankfurt.\n\nBut unlike rice, equity indices aren’t fungible with each other—you\ncan’t freely substitute one basket of stocks for another. However, the\nprices of these indices are still closely related to each other, and\noften move in tandem—such moves might be driven by news or other\nexogenous factors. Part of Jane Street’s role in the markets is to\nkeep these prices fair and in line with each other.\n\nWe can visualize the propagation of information between various equity\nindex futures by examining their price movements. We say that the price\nhas moved if it changes by more than 0.01% in five milliseconds. We can\nthen correlate a movement in one future with a movement in another\nfuture, if the movements happen right after each other.\n\nThat’s what the visualization below shows. We selected some data\nduring a Federal Reserve announcement because it’s a particularly\nvolatile time, which makes it easier to observe this phenomenon. In\nthis time span, price movements are primarily driven by the\nannouncement, so we should expect information to flow outward from the\nS&P 500 to the rest of the world.  For that reason, we centered the\nvisualization around Chicago.\n\n\n\n\n\nIn this visualization, we use a green or red marker around each location\nto indicate whenever the price moves up or down (and we filter out\nmovements which couldn’t be correlated with each other). As a visual\naid, we draw two rings propagating outward from Chicago, the first one\nmoving at the speed of light, and the second one moving at half the\nspeed of light. The vertical stripes on the globe show time-zone\nboundaries.\n\nWe’ve slowed down the video to 1/30th of real time so that it’s easier\nto understand what’s going on. If you look closely, you can see that\nmost price movements occur between the first and second rings. This\nshows us that information can never travel faster than the speed of\nlight, but often isn’t far behind—fiber optic cables carry signals at\nan appreciable fraction of c.\n\n---\n\n[0] “The Dojima Rice Market and the Origins of Futures Trading,” Moss and Kintgen\n\n[1] Bunryu Nishiki, Kumagae Onna Amigasa (Kumagae Lady’s Hat), written\nin 1706, translated by Mayuka Yamazaki\n",
        "url"      : "https://blog.janestreet.com/visualizing-information-propagation-in-markets-index/",
        "image"    : "https://blog.janestreet.com/visualizing-information-propagation-in-markets-index/featured.png",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Computations that differentiate, debug, and document themselves",
        "date"     : "November 17, 2022",
        "authorId" : "asrinivasan",
        "author"   : "Aditya Srinivasan",
        "tags"     : [],
        "minsToRead" : 27,
        "content"  : "One of the problems we wrestle with at Jane Street is how to\nunderstand and manage the costs associated with the positions we hold:\nthings like margin, financing costs, market risk, regulatory capital\nrequirements, and so on.  To that end, we’ve built systems that\nestimate these costs and propose ways to reduce them. Essentially,\nthis is a numerical optimization problem.\n\nThis post describes a library we’ve developed to make that task\neasier, called Gradient_calculator. With this library in hand,\nwe can write a computation just once, and get the ability to:\n\n\n  Evaluate the computation to get its value\n  Differentiate the computation with respect to all its variables\nautomatically\n  Debug it by inspecting intermediate values up to\narbitrary levels of granularity\n  Document it by auto-generating a LaTeX formula from the\ndefinition\n\n\nAnd all of this functionality is provided while allowing programmers\nto express their computations in a natural style.\n\nIt’s worth saying that the core technique that powers\nGradient_calculator – automatic differentiation – is\nby no means novel and in fact goes all the way back to Fortran in the\nmid-60s. This post by Jane Street’s Max Slater explores\nthe wider field of differentiable programming in more detail. The\ntechnique has become increasingly popular in the last decade or so,\nespecially in the context of training neural nets.\n\nSo lots of libraries that do this sort of thing already exist. Why did\nwe build our own?\n\nWe ruled out some great open-source toolkits like JAX simply\nbecause they’re not easily interoperable with OCaml, which is what\nmost of our production systems are written in. But even frameworks\nwith OCaml bindings, like OWL and ocaml-torch,\ntended to model computations abstractly as operations over vectors and\nmatrices (or, more generally, tensors), which did not seem like the\nmost natural way to read, write, or think about the computations we\ntypically want to model ourselves. Our computations also aren’t as\nblack-box-y as a neural net; often there’s a well-defined equation\nthat describes them.\n\nBy developing our own library specifically for these kinds of\ncomputations, we could really focus on making it work well in our\ncontext.\n\nWhat’s more, we had greater control over its design and functionality,\nand capitalized on that by adding support for debugging and\ndocumenting computations, features we didn’t find in existing\nsolutions.\n\nA toy example\n\nLet’s demonstrate how this all works with a toy example, using the\nComputation module exposed by the Gradient_calculator\nlibrary. Suppose we have the following equation: \nwhere initially  and . Expressed as a computation,\nit looks like:\n\nopen! Computation\n\nlet computation =\n  let var name initial_value =\n    variable (Variable.create ~id:(Variable.ID.of_string name) ~initial_value)\n  in\n  let x = var \"x\" 2. in\n  let y = var \"y\" 4. in\n  square (sum [x; square (sum [ y; constant 1.0 ])])\n;;\n\n\n\nIf we do the math by hand, we would find that the partial derivatives\nof this are:\n\n\n\nAnd indeed, we can confirm that Computation knows how to compute this\ntoo, with a simple expect test.\n\nlet%expect_test _ =\n  Computation.For_testing.print_derivatives computation;\n  [%expect\n    {|\n    ┌──────────┬─────────┐\n    │ variable │   ∂f/∂v │\n    ├──────────┼─────────┤\n    │        x │  54.000 │\n    │        y │ 540.000 │\n    └──────────┴─────────┘ |}]\n;;\n\n\n\nThe library API, simplified\n\nHere is a simplified, stripped-down version of the Computation\nmodule:\n\nopen! Core\n\n(** A computation involving some set of variables. It can be evaluated, and\n    the partial derivative of each variable will be automatically computed. *)\ntype t\n\nmodule Variable : sig\n  (** An identifier used to name a variable. *)\n  module ID : String_id.S\n\n  (** A variable in a computation. *)\n  type t\n\n  val create : id:ID.t -&gt; initial_value:float -&gt; t\n\n  (** Returns the current value of this variable. *)\n  val get_value : t -&gt; float\n\n  (** Returns the current partial derivative of this variable. *)\n  val get_derivative : t -&gt; float\n\n  (** Sets the current value of this variable. *)\n  val set_value : t -&gt; float -&gt; unit\nend\n\n(** Constructs a computation representing a constant value. *)\nval constant : float -&gt; t\n\n(** Constructs a computation representing a single variable. *)\nval variable : Variable.t -&gt; t\n\n(** Constructs a computation representing the sum over some [t]s. *)\nval sum : t list -&gt; t\n\n(** Constructs a computation representing the square of [t]. *)\nval square : t -&gt; t\n\n(** [evaluate t] evaluates the computation [t] and returns the result, and\n    updates the derivative information in the variables in [t]. *)\nval evaluate : t -&gt; float\n\n\n\nThe key points to take away here are:\n\n\n  \n    Computation.ts can be constructed directly, for example via\nconstant and variable, or by composing existing\nComputation.ts (e.g. via sum and square).\n  \n  \n    We store information about the values and partial derivatives of\neach variable, and the latter is updated whenever the computation is\nevaluated.\n  \n\n\nThis API hides the internal details related to computing derivatives,\nand packages everything up in a uniform way: values of type\nComputation.t. This lets us easily build on top of this\nabstraction. For example, we can write a gradient descent algorithm\nthat operates on any given Computation.t: all we need is the\nability to evaluate it and extract information about its partial\nderivatives at each step, which every Computation.t allows us to\ndo.\n\nThe library also provides a special function that lets one specify a\nblack-box calculation in which the cost and partial derivatives are\ncomputed “by hand”, likewise packaging the result up as a\nComputation.t. This is useful when the base primitives are\ninsufficient, or when the computation has already been implemented\nelsewhere.\n\nA peek under the hood\n\nAs described earlier, internally, a Computation.t is represented\nas an expression tree, where each node is either a leaf representing\nsome terminal value (like a constant or a variable) or an internal\nnode performing some operation over other nodes (e.g. a summation or\nsquare).\n\nUpon constructing a Computation.t, no evaluation is actually\nperformed. It’s only when evaluate is called that we do any\nwork. In particular, we’ve implemented forward-mode AD, in which we\nare performing the evaluation of our function and computing the\npartial derivatives at the same time, in a single “forward” pass. The\ncost is approximately proportional to the cost of evaluating the\nfunction itself, since we’re mostly just doing some constant\nadditional work for each operation. This is much better than numerical\ndifferentiation.\n\nFor operations involving frequent, nested applications of the chain\nrule, reverse-mode AD can be even more efficient than forward-mode\nAD. However, implementations of reverse-mode AD require tracking more\nintermediate state and are generally more complicated. Given the\ncomputations we were modeling involve few nested applications of the\nchain rule, the performance benefits of reverse-mode AD did not\noutweigh its complexity costs.\n\nReal-world example: calculating the net market value of our positions\n\nLet’s try this out with an example that is more representative of\ncomputations we typically write in real systems. Suppose we want to\ncompute the absolute net market value of our positions, with\npositions netted within our accounts at each bank. First, let’s mock\nout our positions and some market prices.\n\nlet positions : float Ticker.Map.t Account.Map.t Bank.Map.t =\n  [ \"AAPL\", \"BANK A\", \"ACCOUNT W\", 10\n  ; \"AAPL\", \"BANK B\", \"ACCOUNT X\", -20\n  ; \"AAPL\", \"BANK A\", \"ACCOUNT Y\", 20\n  ; \"AAPL\", \"BANK B\", \"ACCOUNT Z\", 10\n  ; \"GOOG\", \"BANK A\", \"ACCOUNT W\", -5\n  ; \"GOOG\", \"BANK B\", \"ACCOUNT X\", 30\n  ; \"GOOG\", \"BANK A\", \"ACCOUNT Y\", 15\n  ; \"GOOG\", \"BANK B\", \"ACCOUNT Z\", -30\n  ]\n  |&gt; List.map ~f:(fun (ticker, bank, account, quantity) -&gt;\n    Bank.of_string bank, (Account.of_string account, (Ticker.of_string symbol, Int.to_float quantity)))\n  |&gt; Map.of_alist_multi (module Bank)\n  |&gt; Map.map ~f:(Map.of_alist_multi (module Account))\n  |&gt; Map.map ~f:(Map.map ~f:(Map.of_alist_multi (module Ticker)))\n  |&gt; Map.map ~f:(Map.map ~f:(Map.map ~f:(List.sum (module Float) ~f:Fn.id)))\n;;\n\nlet prices : float Ticker.Map.t =\n  [ \"AAPL\", 10; \"GOOG\", 15 ]\n  |&gt; List.map ~f:(fun (ticker, price) -&gt; Ticker.of_string ticker, Int.to_float price)\n  |&gt; Map.of_alist_exn (module Ticker)\n;;\n\n\n\nHere’s how you might write the (eager) computation normally:\n\nlet%expect_test _ =\n  let cost_for_one_position (ticker, quantity) =\n    let price = Map.find_exn prices ticker in\n    quantity *. price\n  in\n  let cost_for_one_account by_ticker =\n    Float.abs (List.sum (module Float) (Map.to_alist by_ticker) ~f:cost_for_one_position)\n  in\n  let cost_for_one_bank by_account =\n    List.sum\n      (module Float)\n      (Map.data by_account)\n      ~f:cost_for_one_account\n  in\n  let cost = List.sum (module Float) (Map.data positions) ~f:cost_for_one_bank in\n  print_endline (Float.to_string_hum cost);\n  [%expect {| 1_050.000 |}]\n;;\n\n\n\nThis is pretty standard stuff. However, this calculation is difficult\nto incorporate into an optimization like gradient descent since we\ndon’t have any information about derivatives. Here’s how you’d convert\nthe same calculation to use the Computation API:\n\n\n  \n    *jane-vc-patdiff*\n    \n  \n  \n    \n!  let cost_for_one_position bank account (ticker, quantity) =\n     let price = Map.find_exn prices ticker in\n+    Computation.variable\n+      (Computation.Variable.create\n+         ~id:\n+           (Computation.Variable.ID.of_string\n+              (sprintf !\"%{Ticker} @ %{Bank}/%{Account}\" ticker bank account))\n!         ~initial_value:(quantity *. price))\n   in\n!  let cost_for_one_account bank (account, by_ticker) =\n-    Float.abs (List.sum (module Float) (Map.to_alist by_ticker) ~f:cost_for_one_position)\n!    Computation.abs\n!      (Computation.sum\n!         (List.map (Map.to_alist by_ticker) ~f:(cost_for_one_position bank account)))\n   in\n!  let cost_for_one_bank (bank, by_account) =\n-    List.sum (module Float) (Map.data by_account) ~f:cost_for_one_account\n!    Computation.sum (List.map (Map.to_alist by_account) ~f:(cost_for_one_account bank))\n   in\n-  let cost = List.sum (module Float) (Map.data positions) ~f:cost_for_one_bank in\n!  let cost = Computation.sum (List.map (Map.to_alist positions) ~f:cost_for_one_bank) in\n!  print_endline (Float.to_string_hum (Computation.evaluate cost));\n   [%expect {| 1_050.000 |}]\n\n  \n\n\nAs you can see, the code looks pretty similar: we’re mostly swapping\nout List.sum for Computation.sum, Float.abs for\nComputation.abs, and declaring our\nComputation.Variable.ts. The result of evaluating it is the\nsame, which is obviously good. Further, we now have the partial\nderivatives for each of our variables:\n\nlet%expect_test _ =\n  (* same as above *)\n  Computation.For_testing.print_derivatives cost;\n  [%expect\n    {|\n    ┌─────────────────────────┬────────┐\n    │                variable │  ∂f/∂v │\n    ├─────────────────────────┼────────┤\n    │ AAPL @ BANK A/ACCOUNT W │  1.000 │\n    │ AAPL @ BANK A/ACCOUNT Y │  1.000 │\n    │ AAPL @ BANK B/ACCOUNT X │  1.000 │\n    │ AAPL @ BANK B/ACCOUNT Z │ -1.000 │\n    │ GOOG @ BANK A/ACCOUNT W │  1.000 │\n    │ GOOG @ BANK A/ACCOUNT Y │  1.000 │\n    │ GOOG @ BANK B/ACCOUNT X │  1.000 │\n    │ GOOG @ BANK B/ACCOUNT Z │ -1.000 │\n    └─────────────────────────┴────────┘ |}]\n;;\n\n\n\nIntuitively, these results makes sense. We’re net “long” in every bank\naccount except BANK B/ACCOUNT Z, so an extra dollar there\nreduces our absolute net market value and therefore our overall cost, whereas an\nextra dollar anywhere else increases it.\n\nDebugging computations\n\nWe don’t have to stop here. By virtue of our representation of\ncomputations as statically defined trees, we can do some pretty\npowerful things.\n\nIn particular, as we’re evaluating a computation, we’re traversing the\nentire tree and evaluating the value at each node. We can track that\ninformation as we go and display it in an expect-test-friendly way:\n\nlet%expect_test _ =\n  (* same as above *)\n  Computation.For_testing.print_debug_tree cost;\n  [%expect\n    {|\n    ◉ SUM(...): 1050.00\n    ├──◉ SUM(...): 450.00\n    |  ├──◉ ABS(...): 25.00\n    |  |  └──◉ SUM(...): 25.00\n    |  |     ├──• AAPL @ BANK A/ACCOUNT W: 100.00\n    |  |     └──• GOOG @ BANK A/ACCOUNT W: -75.00\n    |  └──◉ ABS(...): 425.00\n    |     └──◉ SUM(...): 425.00\n    |        ├──• AAPL @ BANK A/ACCOUNT Y: 200.00\n    |        └──• GOOG @ BANK A/ACCOUNT Y: 225.00\n    └──◉ SUM(...): 600.00\n       ├──◉ ABS(...): 250.00\n       |  └──◉ SUM(...): 250.00\n       |     ├──• AAPL @ BANK B/ACCOUNT X: -200.00\n       |     └──• GOOG @ BANK B/ACCOUNT X: 450.00\n       └──◉ ABS(...): 350.00\n          └──◉ SUM(...): -350.00\n             ├──• AAPL @ BANK B/ACCOUNT Z: 100.00\n             └──• GOOG @ BANK B/ACCOUNT Z: -450.00 |}]\n;;\n\n\n\nWe’ve found this to be quite useful in practice. For one thing, it\nmakes debugging a lot easier. Instead of littering our code with print\nstatements we can inspect all intermediate values at once.\n\nThe expect-test integration encourages an exploratory style, in which\nwe begin with some small part of the computation and incrementally\nbuild on it until the entire function is complete. This has proved\nuseful when writing computations from scratch, when the incidence of\nbugs tends to be the highest.\n\nFurther, because we’re observing more than just the final value, we\ncan be more confident that the final calculation is correct. For\nexample, suppose we have a function consisting of a series of nested\nmax operations. For any given set of inputs, we’re only\nexercising some subset of the paths in the calculation, and the final\nvalue only reflects a subset of terms. It may be that the terms on\nsome unexercised paths are actually calculated incorrectly; by\nprinting out all intermediate values, this becomes much harder to\nmiss.\n\nFinally, these expect-tests make it easier to understand and review\nlogical changes in the calculation, regardless of what’s changed in\nthe code.\n\nNote that it’s possible to avoid printing out an unreadably large tree\nfor complex calculations by limiting the depth to which the tree is\nprinted and/or by only including particularly tricky terms.\n\nDocumenting computations\n\nTime for one more magic trick. As with any calculation, once you’ve\nimplemented it, you should take the time to document it. We do this a\nlot, but manual documentation inevitably grows stale, and eventually\nfallow, and it may no longer reflect the actual\nimplementation. Luckily, since we know their structure, we can\nautomatically generate documentation for our computations. Let’s try\nthis out:\n\nlet%expect_test _ =\n  (* same as above *)\n  cost |&gt; Computation.to_LaTeX_string |&gt; print_endline;\n  [%expect\n    {|\n    \\left(\\left| \\left(\\texttt{AAPL @ BANK A/ACCOUNT W} + \\texttt{GOOG @ BANK A/ACCOUNT W}\\right) \\right| + \\left| \\left(\\texttt{AAPL @ BANK A/ACCOUNT Y} + \\texttt{GOOG @ BANK A/ACCOUNT Y}\\right) \\right|\\right) + \\left(\\left| \\left(\\texttt{AAPL @ BANK B/ACCOUNT X} + \\texttt{GOOG @ BANK B/ACCOUNT X}\\right) \\right| + \\left| \\left(\\texttt{AAPL @ BANK B/ACCOUNT Z} + \\texttt{GOOG @ BANK B/ACCOUNT Z}\\right) \\right|\\right) |}]\n;;\n\n\n\nNow let’s render that in a LaTeX block:\n\n\n  \n    \n    \n  \n$$\n\\left(\\left| \\left(\\texttt{AAPL @ BANK A/ACCOUNT W} + \\texttt{GOOG @ BANK A/ACCOUNT W}\\right) \\right| + \\left| \\left(\\texttt{AAPL @ BANK A/ACCOUNT Y} + \\texttt{GOOG @ BANK A/ACCOUNT Y}\\right) \\right|\\right) + \\left(\\left| \\left(\\texttt{AAPL @ BANK B/ACCOUNT X} + \\texttt{GOOG @ BANK B/ACCOUNT X}\\right) \\right| + \\left| \\left(\\texttt{AAPL @ BANK B/ACCOUNT Z} + \\texttt{GOOG @ BANK B/ACCOUNT Z}\\right) \\right|\\right)\n$$\n  \n\n\nThat’s technically correct, but not all that useful. It’s documenting\nthe specific computation we’re evaluating, not its generalized\nform. We can resolve this by introducing some “metavariables”. In\nessence, a metavariable is a symbol that’s used in a formula – it\ndoesn’t represent an actual variable in the computation. Annotating a\ngiven computation to use metavariables is straightforward. In the\nexample above, we can write:\n\n\n  \n    *jane-vc-patdiff*\n    \n  \n  \n    \n     let price = Map.find_exn prices ticker in\n-    Computation.variable\n+    Computation.variablei\n+      ~to_LaTeX:(fun [ (`ticker, ticker); (`account, account); (`bank, bank) ] -&gt;\n+        \"MktVal\"\n+        |&gt; Nonempty_string.of_string_exn\n+        |&gt; LaTeX.of_alpha_exn\n+        |&gt; LaTeX.mathsf\n+        |&gt; LaTeX.function_application\n+             ~args:\n+               [ LaTeX.of_metavariable ticker\n+               ; LaTeX.of_metavariable bank\n+               ; LaTeX.of_metavariable account\n+               ])\n       (Computation.Variable.create\n          ~id:\n            (Computation.Variable.ID.of_string\n               (sprintf !\"%{Ticker} @ %{Bank}/%{Account}\" ticker bank account))\n          ~initial_value:(quantity *. price))\n   in\n   let cost_for_one_account bank (account, by_ticker) =\n     Computation.abs\n-      (Computation.sum\n+      (Computation.sumi\n+         ~metavariable:Metavariable.(of_string \"ticker\" &lt;~ `ticker)\n          (List.map (Map.to_alist by_ticker) ~f:(cost_for_one_position bank account)))\n   in\n   let cost_for_one_bank (bank, by_account) =\n-    Computation.sum (List.map (Map.to_alist by_account) ~f:(cost_for_one_account bank))\n+    Computation.sumi\n+      ~metavariable:Metavariable.(of_string \"account\" &lt;~ `account)\n+      (List.map (Map.to_alist by_account) ~f:(cost_for_one_account bank))\n-  in\n-  let cost = Computation.sum (List.map (Map.to_alist positions) ~f:cost_for_one_bank) in\n+  in\n+  let cost =\n+    Computation.sumi\n+      ~metavariable:Metavariable.(of_string \"bank\" &lt;~ `bank)\n+      (List.map (Map.to_alist positions) ~f:cost_for_one_bank)\n+  in\n\n  \n\n\nAlready this results in much more legible LaTeX output:\n\n\n\nNow, suppose we need to modify our calculation to multiply the\nabsolute net market value at each bank by some predefined rate, which varies across\nbanks. As soon as we update the definition of our Computation.t\nand attach the relevant documentation information to new terms, our\nauto-generated LaTeX formula updates as expected:\n\n\n\nOptimizing computations\n\nAs we mentioned at the beginning of the post, the motivating use-case\nfor developing Gradient_calculator was to provide developers a\nway to easily construct computations that could be optimized using\ngradient descent. The library provides an implementation of gradient\ndescent that builds on top of the Computation.t abstraction. In\nparticular, it exposes an optimize function which takes in some\nComputation.t and determines the values to assign to variables\nin order to minimize that computation.  Let’s see this in action,\nrevisiting our toy example:\n\nlet%expect_test _ =\n  let context = Evaluation_context.initialize computation in\n  let%bind outcome =\n    Gradient_descent.optimize ~debug:true computation context ~step_size:(Fixed 1e-2)\n  in\n  print_s ([%sexp_of: Gradient_descent.Outcome.t] outcome);\n  [%expect {| (Converged (on_iter 1418)) |}];\n  For_testing.print_debug_tree computation;\n  [%expect\n    {|\n    ◉ SQUARE(...): 0.000\n    └──◉ SUM(...): 0.000\n       ├──• x: -0.000\n       └──◉ SQUARE(...): 0.000\n          └──◉ SUM(...): -0.021\n             ├──• y: -1.021\n             └──• CONST: 1.000 |}];\n  return ()\n;;\n\n\n\n\nAs we’d expect,  and  is a minimum of our function.\n\nTo our more realistic net market value example, we would do something\nsimilar, except we’d want to pass additional constraints to\noptimize to ensure that our overall position in each ticker\nacross all accounts remains the same. (After all, we’re looking for\nthe optimal set of transfers to make across accounts, but we aren’t\nreducing or increasing the total number of shares we hold in each\nticker.)\n\nFuture work\nThere’s still a long list of things we can do to further improve this\nlibrary. Here are just a few ideas our team is thinking about:\n\nSupport for incrementality\n\nEvery time evaluate is called, the entire tree is traversed and\nvalues are recomputed from scratch. We could instead have\nevaluate run incrementally, only recomputing the values and\nderivatives for variables whose values changed and any other dependent\nnodes (transitively).\n\nCompiler-inspired optimizations\n\nWe can think of our Computation.t as a sort of “intermediate\nrepresentation” of a computation, with Gradient_calculator’s\nconstructors serving as the “front-end” (i.e. going from some\nspecification of a computation to a concrete Computation.t) and\nthe evaluate function serving as the “back-end” (i.e. going from\nthis Computation.t to a value plus derivative information plus a\nLaTeX formula and so on…). Drawing inspiration from some common\ncompiler optimizations, we could add support for constant folding,\ncommon sub-expression elimination, etc. so that our IR\nComputation.t is simplified as much as possible. Some of these\noptimizations may require us to represent our Computation.t as a\nDAG as opposed to a tree.\n\nAutomatic differentiation using algebraic effects\n\nAlgebraic effects, landing in OCaml 5.0, could provide a mechanism for\nimplementing automatic differentiation using effect handlers, as\ndescribed in this paper and\ndemonstrated in this GitHub\nexample. This\nwould obviate the need to represent computations as trees, which would\npresent a more memory-efficient alternative at the expense of some of\nthe other features discussed in this post.\n\nAlternative display formats\nCurrently, we only support printing the debug tree as an ASCII diagram\nand the formula in LaTeX. We could add support for alternative display\nformats as well (e.g., an SVG version of the tree) so that this\ninformation can be rendered in the most suitable way.\n\nOpen-sourcing the project\n\nOur plan is to release this project as an open source tool soon, at\nwhich point we’ll be eager for any other ideas, suggestions, or\nfeedback.\n\n\n\n",
        "url"      : "https://blog.janestreet.com/computations-that-differentiate-debug-and-document-themselves/",
        "image"    : "https://blog.janestreet.com/computations-that-differentiate-debug-and-document-themselves/cover.png",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Introducing the Jane Street Graduate Research Fellowship",
        "date"     : "August 30, 2022",
        "authorId" : "eberger",
        "author"   : "Emily Berger",
        "tags"     : [],
        "minsToRead" : 2,
        "content"  : "We are excited to announce the launch of the Jane Street Graduate Research Fellowship!\n\nThis is a one-year fellowship for exceptional doctoral students\ncurrently pursuing a PhD in mathematics, computer science, or\nstatistics.\n\nAt Jane Street we take a rigorous, quantitative approach to trading on\nglobal markets, combining techniques from machine learning,\ndistributed systems, programmable hardware, statistics, and applied\nmathematics. Our culture is steeped in games, puzzles, and challenging\nproblems. With the Graduate Research Fellowship, we’re excited to\nsupport PhD students who share our values: technical excellence,\nintellectual curiosity, and humility.\n\nSee below for more details and feel free to direct questions to\ngraduate-research-fellowship@janestreet.com.\n\nAward details\n\n\n  \n    Full tuition and fees will be fully covered for the academic year.\n  \n  \n    A $40,0001 USD stipend will be provided to help with living\nexpenses while in school.\n  \n  \n    Fellows will be invited (but not required) to visit New York City to\ngive a talk on any topic of their choosing to Jane Street employees\nand other fellows; all expenses will be paid.\n  \n\n\nEligibility\n\n\n  \n    PhD students who will be in their 2nd to 6th year of a PhD program\nin mathematics, computer science, or statistics in the next academic\nyear are eligible to apply.\n  \n  \n    Students must remain in good academic standing as an active,\nfull-time member of their PhD program at a degree-granting university\nin the United States for the duration of the academic year of the\naward, or forfeit the award. Deferrals for medical/other reasons may\nbe granted on a case-by-case basis.\n  \n\n\nApplication\n\nYou can find more details on the program’s webpage: Jane Street\nGraduate Research\nFellowship. That\nincludes a link to the application.\n\nUniversity eligibility\n\nIn general, PhD fellowships require university-specific legal\nnegotiations; we currently have arrangements with several institutions\nand are actively working on growing this list. Meanwhile, we are happy\nto award fellowships to students in programs at universities where we do\nnot yet have an agreement in place. If accepted, we will work with your\nprogram to arrange the award of your fellowship. Universities must be\naccredited research institutions in the United States that award\ndoctoral degrees.\n\n[1] Note that stipends are taxable and may be subject to withholdings.\n",
        "url"      : "https://blog.janestreet.com/graduate-research-fellowship/",
        "image"    : "https://blog.janestreet.com/graduate-research-fellowship/GRF.png",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "What the interns have wrought, 2022 edition",
        "date"     : "August 25, 2022",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : ["internship"],
        "minsToRead" : 10,
        "content"  : "We’re once again at the end of our internship season, and it’s my task\nto provide a few highlights of what the interns accomplished while\nthey were here.\n\nAnd once again, the program was bigger than ever.  We had 122 dev\ninterns globally, coming from 42 different schools and 13\ncountries. As the program grows, it gets harder and harder to\nsummarize it with any degree of fidelity by picking just three\nprojects.\n\nBut, choose we must! So here are my somewhat arbitrarily chosen\nexample projects:\n\n\n  \n    Aaron Lamoreaux extended magic-trace to\nsupport a sampling mode, along with several other improvements.\n  \n  \n    Matthew Sirman worked on integrating Datafetcher, a library for\nwriting testable and efficient data-loading-and-transformation\ntasks, with\nIncremental, a\nlibrary for building on-line algorithms.\n  \n  \n    David Vulakh extended our\nQuickcheck\nproperty-based testing library with a convenient syntax for\nspecifying contracts to simplify the testing of complex APIs.\n  \n\n\nNow, let’s dive in to the details!\n\nSampling-mode for magic-trace\n\nmagic-trace is an open-source tool we built that takes advantage of\nIntel Processor Trace (Intel PT) to provide hyper-detailed traces of\neverything a program did for a small window of time, around 10ms.\n\nThe initial version of magic-trace was implemented as an intern\nproject (briefly mentioned\nhere),\nand we’ve since open-sourced\nit and made it available on\nGithub.\n\nPart of what makes magic-trace great is the enormous detail afforded\nby Intel PT’s ability to cheaply write down every branch taken by a\nprocess. But another one of magic-trace’s super-powers is its extreme\nease-of-use.  Lots of performance analysis tools require you to figure\nout the right arcane invocation, and then stare at some confusing text\noutput to divine what’s going on.\n\nmagic-trace tries really hard to get rid of all that noise.  The\ncommand-line tool does sensible things by default, and instead of\nstaring at an arcane blog of text, it’s integrated with a modified\nversion of Perfetto, a pretty and intuitive\ntrace-viewer from Google.\n\nAaron Lamoreaux’s primary project was to extend magic-trace to cover\nmore territory, while still maintaining magic-trace’s great\nuser-experience.  In particular, Aaron added a sampling mode to\nmagic-trace, so, rather than being able to see everything that\nhappened in the 10ms leading up to a given event, you could instead\nget sampled data for the last 20 seconds before an interesting event,\nor even 20 minutes.  Another win is that sampling can be done on AMD\nboxes, while Intel PT is an Intel-only feature.\n\n\n\nA lot of the hard work here about getting the user experience right,\nand in particular figuring out how to pick good default behaviors\ndepending on what users are doing and what hardware they’re running\non.  E.g., picking a reasonable sampling rate, or deciding whether to\nuse LBR or\nDWARF\nwhen interpreting stack traces.\n\nThe end result hasn’t yet been rolled into production, but some folk\nfrom our Marketdata team have already put it to good use, figuring out\nthat one of our packet-processing applications was wasting a lot of\ntime in the GC due to allocating something on each packet.\nmagic-trace made it trivial to see the problem, and to figure out\nwhere the stray allocation was coming from.\n\nWhile sampling was his main project, Aaron got some other magic-trace\nimprovements out too, like:\n\n\n  \n    multi-process tracing, so you could see combined traces from two\napplications running on the same box, and see what happens as data\nflows from one to the other.\n  \n  \n    CPU-frequency transition reporting, so you can see when a use of\nAVX2\ninstructions causes you to stall for 14µs!\n  \n  \n    Optimizing magic-trace’s trace decoding step by 10%-50% (depending\non the trace) by writing a dlfilter\nback-end\nfor perf.\n  \n\n\nMarrying Datafetcher and Incremental\n\nDatafetcher is a library that we talked about in last year’s\n“wrought”\npost,\nwhich is designed to help you write jobs that fetch and transform\ndata. Critically, it understands the structure of the job its running,\nwhich lets it both optimize the execution, e.g., by batching requests,\nand helps you build tests by automatically recording external data so\nthat you can replay it in your tests.\n\nA traditional datafetcher program is a batch job: E.g., grab a bunch\nof data from different services, munge it all together, and produce\nconfigs for your desk’s trading infrastructure.\n\nBut sometimes, we want to build systems that look at both static data\nsources and live marketdata.  Today, if we build such a computation in\nDatafetcher, we’d have to run it over and over in a loop, which is\nreally slow!\n\nMatthew Sirman’s task was to figure out how we could make Datafetcher\njobs that could respond efficiently to live data, without having to be\nrerun from scratch.  In particular, he did that by combining\nDatafetcher with a library called Incremental.\n\nWe’ve talked\nabout\nIncremental\na\nlot\nover\nthe\nyears,\nbut the basic point of Incremental is to help you build programs that\ncan be refreshed cheaply when their inputs change.\n\nIncremental and Datafetcher are in some ways pretty similar: they both\norganize your computation as a graph, tracking dependencies between\ndifferent stages of the work; and they use similar APIs for doing so,\nboth taking advantage of our syntax\nextension for making such\ncomputations easier to read and write.\n\nBut they’re also really different!\n\n\n  Datafetcher provides batched and testable all-at-once\nexecution of asynchronous programs\n  Incremental provides incremental execution of synchronous\nprograms.\n\n\nWhy not both? Can we build a system that gives us batched,\ntestable, and incremental execution of an asynchronous\nprogram, by essentially compiling a Datafetcher program into an\nIncremental computation?\n\nWe can, and doing that was the core of Matthew’s project.\n\nThere were a lot of things to do along the way. One step was to figure\nout how to write an incremental computation that could deal with\nasynchrony. This required a way of representing an in-process\nIncremental computations.  The solution here was to add a new type\n(called Response_or_waiting) that represented the current state of\nan asynchronous computation.\n\nFiguring out when and where to memoize was another challenge.  For\nintermediate spots in the computation, the decision was to give users\nthe ability to explicitly decide where memoization should occur.  For\nfetch-nodes, memoization was important for correctness and so was not\noptional, so that computations would behave consistently, even in\ncases where re-issuing the same fetch request could lead to different\nresults.\n\nThere were other issues too, like figuring out how to conveniently\ntest incremental datafetcher jobs, and how to automatically convert\nordinary datafetcher jobs (which get the data just once) into\nincremental ones, which will continue to update via polling.\n\nIn addition to working on extensions to the Datafetcher library,\nMatthew got to utilize those extensions to improve the\ndesk-application that motivated this work in the first place, getting\nto see the end-to-end application of his work.  It’s not quite in\nproduction yet, but the results look good so far.\n\nContracts for Quickcheck\n\nTesting is something we spend a lot of time on here, and over the\nyears, we’ve built a lot of tools to make testing easier and more\npleasant.\n\nOne of those tools is our\nQuickcheck\nlibrary.  Quickcheck helps you with property-based testing, which is\na style of testing where you marry together a collection of properties\nyou want to test with a mechanism for generating random examples to\ntest with.\n\nQuickcheck aims to make this kind of testing easy. It does this\nprimarily by providing libraries for building probability\ndistributions, combined with a syntax extension that obviates most of\nthe boilerplate of using those libraries.  Here’s an example.\n\ntype shape =\n  | Circle of { radius: float }\n  | Rect of { height: float; width: float }\n  | Poly of (float * float) list\n[@@deriving quickcheck];;\n\n\n\nGiven the above, ppx_quickcheck will generate a probability\ndistribution automatically.  That distribution makes a bunch of\nchoices (like equiprobably returning a Circle, Rect, or Poly),\nbut you can add more annotations to tweak the distribution if you need\nto.\n\nEven with all this, there’s still a decent amount of work required to\nset up each test for each property you want to validate.  That’s where\nDavid Vulakh’s project came in.  David added a new syntax for\nreducing the boilerplate further.  In particular, his syntax extension\nlets you specify explicit contracts associated with each call in an\ninterface, and the rest of the test generation can be driven from\nthat.\n\nThis is maybe easiest to explain with an example.  Here’s a subset of\nthe String API in Base, to\nwhich this contract syntax has been applied:\n\nval to_list : t -&gt; char list\n[@@contract fun t -&gt; equal t (t |&gt; to_list |&gt; of_char_list)]\n\nval length : t -&gt; int\n[@@contract fun t -&gt; length t = List.length (to_list t)]\n\n\n\nNote that contracts can use multiple functions from the same API to\nexpress their property.  In the case of to_list, the contract checks\nthat if you take a string, convert it to a list of characters, and\nthen back into a string, the result is equal to the string you started\nwith.\n\nThe way in which these contracts are exercised is pretty simple:\nregular Quickcheck generators are used to create all of the inputs for\na given function, and then the function is called on them, and the\nproperty is checked, signaling an error if the property fails.\n\nThis use-case is pretty simple, but there are some complicated corners\nthat needed to be figured out, like providing more annotations to\nallow customizing the probability distributions used for particular\narguments.\n\nAfter finishing up the contract work, David then worked on another\nextension to Quickcheck, which is support for\nbisimulation-style\ntests.  The idea of bisimulation is to have two different\nimplementations of the same API, and to test them against each other.\nWhat’s tricky here is that it’s not enough to generate random inputs\nand call each function; you actually need to call sequences of\noperations to build up the values that you’re operating on.  This is\nespecially useful for tricky, performance sensitive data-structures,\nwhere it’s easy to write a correct-but-slow version, and really quite\nhard to write a correct and well-optimized one.\n\nThere’s a bit more work to do to get the bisimulation mode to the\npoint where it really covers everything we need, but it’s already at a\nstage where it’s useful for real examples.\n\nSo, what are you waiting for?\n\nSummarizing the whole summer in just a handful of projects is an\nimpossible task, but I hope this gives you a flavor of the kinds of\nthings that our interns are able to accomplish.  (And I should\nmention: this work only reflects part of their work! Each intern\nactually works in two different areas over the course of the summer.)\n\nIf this sounds like the kind of work you’d like to do, then you should\napply.\nThe internship is a great chance to learn, both about software\nengineering, and about the world of trading.  And if you’re curious\nabout our interview process, check\nthis out.\n",
        "url"      : "https://blog.janestreet.com/what-the-interns-have-wrought-2022/",
        "image"    : "https://blog.janestreet.com/what-the-interns-have-wrought-2022/WTIHW-2022-v3.jpg",
        "topic"    :  ["technology","internship"] ,
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Research internships in our Tools and Compilers group",
        "date"     : "March 4, 2022",
        "authorId" : "gyorsh",
        "author"   : "Greta Yorsh",
        "tags"     : [],
        "minsToRead" : 2,
        "content"  : "We are excited to announce research internships in our Tools and\nCompilers group.\n\nResearch internships in T&C are designed for PhD students in\nprogramming languages, compilers, verification, or related areas.\nResearch interns will be applying latest research in their area to\nreal-world problems, pushing the boundaries of programming languages.\n\nAt Jane Street, we have long benefited from our close relationship\nwith the programming languages research community. Research\ninternships feel like a natural extension of that relationship:\ncollaborating with promising young researchers on some of the\nchallenging problems we’re trying to solve.\n\nThe team\n\nJane Street’s Compilers team focuses on improving OCaml as a\nfoundation for Jane Street’s ever-growing technology stack, in\ncollaboration with the greater OCaml community. We aim to make it\neasier for developers to express their ideas in OCaml, to improve the\nperformance of the generated code, and to make the OCaml compiler\nitself faster and easier to use.\n\nWe also extend and enhance the surrounding toolchain, working on tools\nfor profiling, debugging, documenting, and building OCaml code. The\nvast majority of our work is open-source, and we upstream as much as\nwe can to the mainstream OCaml compiler.\n\nOver the years, we’ve done more and more applied PL research\nourselves, working on every aspect of OCaml, including extending the\ntype system with support for novel language features, re-engineering\nthe optimizer ground-up, and feedback-directed optimization.\n\nFollow the links to learn more about our recent work on unboxed\ntypes for efficent memory layouts; modal types for\nsupporting safe stack allocation, software prefetching for OCaml’s\ngarbage collector which lead to massive speedups in the\nmarking phase; evaluating the best-fit memory allocator,\nmagic-trace, memtrace; build\nsystems, user interfaces, language design and\noptimization for OCaml.\n\nProject ideas\n\nHere are a few areas we know we’d be interested in exploring:\n\n\n  Type systems that track locality and uniqueness\n  Verifying C bindings with respect to OCaml’s garbage collector\n  Super-optimization\n\n\nBut that’s not an exhaustive list, and we’d love to hear new ideas\nfrom applicants.\n\nLogistics\n\nOur existing software engineering internships program is aimed at\nundergraduate students. Our interns have landed a bunch of exciting\nprojects, often related to programming languages. Some projects\nwe would like to pursue require specialized knowledge and\nexperience beyond the undergraduate-level curriculum.  The research\ninternships will run alongside the existing internships to fill this\ngap, and will have a slightly different selection process.\n\nDuring the application process, we will work with the candidate to\nidentify a project that aligns with their research expertise and\ninterests.\n\nThe open-source nature of T&C projects means collaborations with\nresearch interns may continue outside of the time frame of the\ninternship.  Our goal is to select projects that can lead to\npublication of the work in a scientific journal or conference.\n\nResearch internships can be hosted in our London and New York offices\nall year round.\n\nYou can find the applications here:\n\n\n  London:\nhttps://www.janestreet.com/join-jane-street/position/5866838002/\n  New York:\nhttps://www.janestreet.com/join-jane-street/position/5869205002/\n\n\n",
        "url"      : "https://blog.janestreet.com/research-internships-tnc/",
        "image"    : "https://blog.janestreet.com/research-internships-tnc/ResearchInternshipsTnC.png",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "How Jane Street Pairs Interns to Projects and Teams During the Software Engineering Internship",
        "date"     : "January 14, 2022",
        "authorId" : "smitchell",
        "author"   : "Sydney Richiez",
        "tags"     : [],
        "minsToRead" : 3,
        "content"  : "Software engineering intern candidates often ask how team placement\nworks and how much input incoming interns have over their teams and\nprojects. We know team placement is an important factor for many\nstudents when deciding which internship to accept. We’ve spent\nconsiderable time and thought on this process in recent years and hope\nto demystify the experience with this post.\n\nHow are projects selected?  — The process is decentralized: each\nwould-be intern mentor comes up with a set of project ideas based on\nthe needs of their team. These proposals are then reviewed by software\ndevelopers that run the internship program.\n\nTeams select projects that meet the following criteria:\n\n\n  \n    The project is useful and interesting: The team must actually want\nthe project completed: we don’t give interns throwaway projects or\nbusywork.\n  \n  \n    It is designed so that the intern produces code early: We want\ninterns to get feedback on their work as early as possible.\n  \n  \n    It contains multiple milestones where the code is releasable: We\nwant interns to see some of their work in production before the end of\nthe internship even if they don’t finish the entire project.\n  \n\n\nHow is team placement determined?\n\nA week or two before the internship starts, we set up a Zoom call\nbetween each intern and a software engineer involved in running the\ninternship where we talk in depth about the intern’s preferences.\n\nWe’ll ask if the incoming intern has preferences about:\n\n  The area of the firm their team works in\n  The type of technical work involved in the project\n  Team dynamics and working styles\n  Things they don’t want to work on\n\n\nAs we talk through each of these questions and others, we’ll share\nmore context about the kinds of work being done on various teams. This\ngives the interns an opportunity to express less concrete and more\nopen-ended preferences.  We also ask how much every preference matters\nto the intern.  While we hope to match every preference an intern has,\nthere are times when we have to compromise, so we use the strength of\npreferences to make sure we get the interns to the best possible fit\nfor them.\n\nNot having strong preferences about any of these things is perfectly\nfine. We think all of our mentors and projects are great, but giving\nfolks the opportunity to express preferences when they have them has\nbeen a major win for us and the interns.\n\nWhy don’t we just provide a list of projects and let interns pick?\n\nFairness: Resolving situations where many interns want the same\nproject would be hard to do equitably.\n\nLack of context: Without a good understanding of our tech stack\nand what teams do, it might be hard to know from a short project\ndescription how well a project actually aligns with your preferences.\n\nTeam fit considerations: For many people, group dynamic and\nworking style are much more important than the content of a project.\n\nLogistics: We ask for projects that are relevant to the needs\nof the team and are something the mentors actually care about and want\nto get into production. This means that projects aren’t known months\nbefore the internship. If the project was something they could write\ndown and wait six months on, it probably isn’t something they care\nmuch about getting done. In many cases, we are receiving project\ndescriptions up until the weeks before the internship.\n\nWhat have interns worked on in the past?\n\nAt the end of each summer, we share some highlights in our “What the\ninterns have\nwrought”\npost. You can find years’ worth of past\nposts on our tech\nblog.\n\n",
        "url"      : "https://blog.janestreet.com/project-pairing/",
        "image"    : "https://blog.janestreet.com/project-pairing/NewProjectPairing.png",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Magic-trace: Diagnosing tricky performance issues easily with Intel Processor Trace",
        "date"     : "January 11, 2022",
        "authorId" : "thume",
        "author"   : "Tristan Hume",
        "tags"     : [],
        "minsToRead" : 17,
        "content"  : "Intel Processor Trace is a hardware technology that can record all\nprogram execution flow along with timing information accurate to\naround 30ns. As far as I can tell almost\nnobody uses it, seemingly because capturing the data is tricky and,\nwithout any visualization tools, you’re forced to read enormous text\ndumps.\n\nMagic-trace is a tool we built and open-sourced to make it easy to\ncapture a trace of around 10ms leading up to a function call you\nchose to instrument, and then visualize the call stack on a timeline\nwhere you can zoom in and see every function call and how long it\ntook. Here’s a captured trace of 5ms of OCaml program startup:\n\n\n\nAnd here’s the same trace zoomed in to an arbitrary 500 nanoseconds.\nThe thin red events are 1-3 nanoseconds:\n\n\n\nRecently we’ve been using this tool to diagnose performance issues that\nwould be very difficult to solve with other tools. Using it is as easy\nas adding a Magic_trace.take_snapshot call to your code (or using a\nfuzzy-finder to select any existing function), then running\nmagic-trace attach and using the fuzzy-finder to select your process.\nIt’ll spit out a trace you can view in Google’s Perfetto trace\nviewer.\n\nIn this post we’ll go over why Processor Trace is so special, the\ndifficulties of building something on top of a hardware technology\nalmost nobody uses, how we were beset by a kernel bug and a hardware\nbug, and the kinds of problems we’ve been able to solve with the\ntool.\n\nWhy Intel Processor Trace, and why not?\n\nLet’s look at the major types of performance analysis tools and why\nmagic-trace serves a different niche:\n\nSampling profilers interrupt the program every 250 microseconds or\nso, sample the current call stack, and then summarize them all\ntogether. These are great for giving you a sense of where your program\nis spending its time. However, at Jane Street we have lots of\nhigh-performance trading systems that spend nearly all of their time\nwaiting for network packets that we want to respond to in far less\nthan the 250-microsecond sampling interval. Sampling profilers are\napproximately useless for diagnosing latency issues on that scale:\nyou’d be lucky to get one sample in the code you care about!\n\nEven in more traditional systems, you may want to diagnose short but\nrare tail latency events, or notice the difference between a function\nbeing called 10 times more than you expected or one call to it taking\n10 times longer than expected, which a sampling profiler can’t tell\nyou.\n\nInstrumentation-based tracing either patches or compiles probes\ninto a program that record when certain functions start and end, then\ntypically visualizes them on an interactive timeline UI. We re-use the\nUI from the Perfetto tracing system for magic-trace, although we\nneeded to fork it to better handle events at the scale of single\nnanoseconds. High-performance tracing systems like tracy even\nmanage to get the overhead down to around 2ns per call (we built a\nsimilar system for OCaml and open-sourced it). However,\ninstrumenting every single function is risky (e.g. you might triple the\ncost of a 1ns function that’s called everywhere) so typically they\nrequire manual instrumentation, and sometimes your performance problems\nare in an app or function you haven’t annotated.\n\nHardware tracing like Intel Processor Trace (IPT) has the\nadvantages of tracing but doesn’t require any instrumentation, and can\nhave much lower overhead than instrumenting everything. They use a very\nefficient format that only encodes just enough info to reconstruct the\ncontrol flow – for example conditional branches take one bit. Time\noverhead for IPT varies from 2-20% depending on the program, with every\none of our programs I’ve benchmarked experiencing less than a 10%\nslowdown and usually under 5%.\n\nThere are a few downsides to Processor Trace though:\n\n\n  Many VMs don’t support it and it needs a post-Skylake Intel processor\n(some other vendors have similar tech; AMD doesn’t yet).\n  You have no choice but the full 1GB/s firehose (with the exception of\nsome limited filtering options) so it’s difficult to store and\nanalyze longer traces. With instrumentation you can manually pick the\nimportant functions and economize on trace size.\n  Decoding is slow because it needs to follow along with disassembled\ninstructions from the binary and reconstruct the flow. Other than\nspecialized decoders for fuzzing, the fastest decoder is 60x slower\nthan real time.\n\n\nA minimum viable product\n\nDuring Jane Street’s 2021 summer internship, I was talking to\nsome colleagues about our issues profiling very short interesting time\nsegments. I noted that Intel Processor Trace would be great for this\nbut that it was really hard to use. Then I realized that with the\ntrace visualization library I had just written, and some features from\nthe Processor Trace documentation I had just read, I could see a path\nto a user-friendly tool. So I drafted a new intern project document,\nand for the second half of his internship, Chris Lambert and I worked\non putting it together.\n\nThe key idea behind quickly making a useful tool was to limit the\nscope:\n\n\n  We’d focus on the circular buffer mode, where it overwrites old data\nuntil you snapshot it after something interesting happens. Processor\nTrace can save all data, but doing so creates 1GB of trace file\nper second.\n  We’d trigger the snapshots based on a function call in the target\nprogram. There are lots of other possibilities for deciding when to\nsnapshot, but calling a function is very flexible, especially if you\nput it behind custom logic waiting for tail latency events or\nsomething.\n  We’d only visualize function calls and returns, and only on a trace\ntimeline. Processor Trace gives you full control-flow data and in\ntheory you could visualize down to individual lines, but that ends up\nbeing too much data to deal with.\n\n\nThe first stage was to implement the tool as a wrapper around the\nLinux perf tool’s Processor Trace functionality, and Chris blazed\nthrough it in under a week. Sending the SIGUSR2 signal to perf\ncaused it to take a snapshot, so Chris wrote a\nMagic_trace.take_snapshot function that sent SIGUSR2 to the parent\npid. Then he wrote a parser and call-stack reconstructor to turn the\nperf script text dump of all branches into a trace that handled\nOCaml features like tail-calls and some exceptions.\n\nIt was pretty exciting looking through the first traces and being able\nto zoom in and see the smallest details, and immediately noticing\nthings like that OCaml program startup time was mostly composed of\nmodule initializers page faulting in random parts of the binary.\n\nDirectly using kernel APIs and libipt\n\nThen we embarked on something harder. Parsing the output of the perf\ntool was slow and couldn’t do the instruction-level decoding needed for\nproperly handling pushes and pops to the OCaml exception handler stack.\nWe decided to try directly using the kernel perf_event_open API\nand Intel’s libipt decoding library.\n\nThis turned out to be quite tricky, as we couldn’t find any evidence\nanyone had ever tried directly integrating perf_event_open with\nlibipt before. I ended up spending my days poring over documentation\nand source code of libipt and the perf tool to figure out how to\ndo things we hadn’t understood yet and handing answers and example\nlinks over to Chris, who wrote and debugged the C code to interface\nwith the APIs and with OCaml.\n\nAfter lots of research and debugging, by the end of his internship we’d\nmanaged to get a trace of events out of our from-scratch\nimplementation. After Chris left I debugged the remaining issues and\nplumbed it in fully. Hopefully now that we’ve published a reference\ncodebase, anyone else attempting this will have an easier time.\n\nHardware breakpoints for seamless triggering\n\nAfter Chris left and things were working, the biggest feature that we\nneeded to make useful and easy was the ability to attach to existing\nprocesses.  Unfortunately this broke our parent-SIGUSR2-based\nsnapshot signalling. I wanted Magic_trace.take_snapshot to have close\nto zero overhead while magic-trace wasn’t attached, and low overhead\neven when it did trigger a snapshot. I thought I might have to have\nevery process host a tiny IPC server or use ptrace, but I\nwasn’t happy with those solutions.\n\nI spent a bunch of time looking for a better solution and eventually I\nfound a really satisfying one in the perf_event_open docs. It\nturns out that perf_event_open can use hardware breakpoints and\nnotify you when a memory address is executed or accessed.\n\nThe cool thing about this approach is that it requires no cooperation\nfrom the target, no overhead when not attached, and can actually be\nused on any function we want, not just a special\nMagic_trace.take_snapshot function. When we do use it on a special\nfunction, we can sample registers so we can see the arguments it was\ncalled with, allowing the user to include metadata with their\nsnapshot.\n\nI think it says something interesting about my programming aesthetic\nthat I spent a whole day researching alternatives to adding a tiny IPC\nserver and ended up using a niche kernel API and hardware feature. I\nknew the hardware allowed a design which didn’t require recompiling or\nadding extra bloat to processes that weren’t being traced, and I really\nwanted to make first-time use as smooth as possible and avoid bloating\neveryone else’s programs. If I did go the IPC route, I was at least\ngoing to use less-obscure-but-still-rare Linux-only abstract domain\nsockets (named by the PID) to avoid having to clean up files or deal\nwith ports. Sometimes standard approaches can’t get you to an ideal\nuser experience, but they’re easier for your coworkers to maintain, you\nrun into fewer issues, and need to do less research. This tradeoff\nleaves low-hanging fruit for people who enjoy diving deep into obscure\ndocumentation and debugging weird issues, which can tip the balance.\nHardware breakpoints, the whole magic-trace project, and other\nprojects of mine are all the result of delighting in asking myself\n“could I obliterate this problem by being willing to do cursed things?”\n\nKernel bugs and hardware bugs, the perils of being early\n\nPeople have sometimes used Processor Trace, and it mostly works, but\nI’ve learned that when using niche and complicated new hardware, I\ncan’t have the same low priors as I usually do about bugs being due to\nthe kernel or hardware.\n\nI was excited to be able to try my hand at kernel debugging for the\nfirst time when I discovered a way to crash the kernel using a\nspecific unusual combination of Processor Trace features. I used info\nfrom the kernel core dump, and read through control flow paths and\nrecent patches in the kernel, to figure out the reason for the null\npointer access. It turns out a patch added a flag that made one piece\nof state invalid to access, but missed guarding it with an if\nstatement in one place. Exactly the kind of bug that algebraic data\ntypes in OCaml/Rust/etc help you avoid :)\n\nAnother bug was much more mysterious and difficult. On exactly one\nprogram out of any I tried, Processor Trace would mysteriously stop\nadding events to the trace buffer before it reached the snapshot\npoint. I spent 2 weeks adding various sorts of observability and\nfixing other issues that got in the way (so at least magic-trace\nended up better regardless), and couldn’t find any sensible software\ncause, e.g. a context switch that lined up with when the recording\nstopped. Finally I tried running it on a newer generation of Intel\nprocessors and the problem went away. I suspect it may be Intel\nerratum SKL171 “Intel® PT May Drop All Packets After an Internal\nBuffer Overflow” which happens under a “rare microarchitectural\ncondition”, although it still might be some race condition kernel bug\nthat’s very consistent only in the older hardware.\n\nSolving tricky problems\n\nPeople have only been using magic-trace internally for about a month\nbut we’ve already made good use of it.\n\nThe original design goal was to help with performance problems in\nhigh-performance trading system that sampling profilers are hopeless\nfor, and that’s panned out. It helped identify a 100ns performance\nregression caused by a patch that turned out to cause a function call\nnot to be inlined. It also helped diagnose why a new compiler version\nmade a trading system slower, which also turned out to come down to an\ninlining decision.\n\nBut after we built magic-trace, we realized it could help with another\nkind of difficult performance problem that people at Jane Street\nencounter frequently. We use Async to cooperatively handle many\nconcurrent tasks. The “cooperatively” part means that if one task\ntakes too long then all other tasks have to wait for it. If you have\nan issue that causes a task to handle way more work than usual, it can\ncause a “long async cycle”. These can be really tricky to debug if\nthey only happen occasionally, since you don’t get any info about what\ncode was too slow. Previously people have resorted to capturing\nenormous long perf profiles and then using logged monotonic\ntimestamps to filter the profile to the relevant region.\n\nNow with magic-trace people can just add a snippet of code that\ncalls Magic_trace.take_snapshot after cycles over a certain length,\nand then attach magic-trace and wait for it to capture. Even if a\nlong cycle is 15 seconds, the last 10 milliseconds of the job are\nnormally the same uniform large batch of work, so you can just look\nback in the trace to see which code is doing too much work. We’ve\nalready used this to solve one tricky issue where there were way more\nitems in a certain collection than expected and a loop was spending\nseconds working over them. Sampling profile filtering would’ve been\nharder and wouldn’t have been able to tell whether the function was\nlooping too many times instead of, say, taking a really long time once\nor just always being somewhat slow.\n\nEven if magic-trace is only indispensable for certain\nhigh-performance code, as a user-friendly performance tool in a\ntoolbox it can be useful for all sorts of debugging and performance\nproblems just by being quicker and easier to use than alternatives.\n\nHow you can use magic-trace\n\nWe designed magic-trace for our own OCaml code but now you can use\nit on any native executable with symbol names (e.g. a C++ or Rust\nprogram) as long as you have a new enough Intel Linux machine. Here’s\nhow:\n\n# If this prints 1 your machine supports Processor Trace with precise timing\ncat /sys/bus/event_source/devices/intel_pt/caps/psb_cyc\n# Install Opam (https://opam.ocaml.org/doc/Install.html), then OCaml 4.12.1\nopam switch create 4.12.1\nopam switch 4.12.1\n# Add the Jane Street pre-release repo, magic-trace isn't on public Opam yet\nopam repo add janestreet-bleeding https://ocaml.janestreet.com/opam-repository\n# Install the magic-trace command line tool\nopam install magic-trace\n# This lets you fuzzy-search a process to attach to and a symbol to snapshot on\nmagic-trace attach -symbol '' -output magic.ftf\n# Go to https://ui.perfetto.dev/ and open the trace file\n\n\n\nNobody else has used Perfetto for traces like this before so we also\nneeded to leave our OCaml and C extensions to the land of Typescript\nand patch Perfetto to support zooming to nanosecond-levels. The\nmain Perfetto UI works, and we hope to upstream some patches to\nit, but for the best experience you can build the UI from our\nfork.\n\nLet’s use more Processor Trace!\n\nIntel Processor Trace is an incredibly powerful and cool technology, and\nit’s an absolute shame that more people don’t use it. I hope that\nmagic-trace shows people how useful Processor Trace and technologies\nlike it can be, and makes it easier to use Processor Trace by providing\nan example codebase.\n\nOne way to build tools on top of Processor Trace that I haven’t\nmentioned yet is perf-dlfilter, which allows you to consume\nperf events using an efficient C API with a shared library rather\nthan parsing text output. We didn’t use it because it was just being\nsubmitted to the kernel as we were writing magic-trace; we didn’t\nlearn about it until I stumbled upon it months later. I’d recommend\nthat tools try to start with perf and dlfilter rather than\nperf_event_open and libipt, as it just implements tons of stuff\nyou’ll otherwise need to reimplement.\n\nAt the end of his internship, Chris even suggested that with hindsight\nwe should have forked perf to add a C interface rather than\nembarking on the libipt route – and luckily someone else did, with\nthe specific goal of efficiently reading Processor Trace events! You\ndon’t even need a super new kernel, because you can compile the perf\ntool from a newer kernel tree and use it on an older kernel.\n\nHere are some more ideas we’ve thought of for Processor Trace tooling\nthat nobody’s built yet and that we might build into magic-trace:\n\n\n  Visualizing control flow at the level of individual lines of code,\nperhaps with a custom trace viewer designed for Processor Trace\nwhich lazily decodes the lowest levels again as you zoom in.\n  Feedback Directed Optimization of low-latency systems by optimizing\nbased on recorded control flow in the latency-critical case.\n  Using Processor Trace to evaluate compiler optimizations by counting\nthe number of actually executed register spills, page faults,\netc. on benchmarks.\n  Building efficient instrumentation-based tracing on top of the\nPTWRITE instruction in the latest processors, which allows adding\ndata into the trace.\n  Using the “PEBS via PT” feature on very new processors to sample cache\nmisses or branch mispredicts and add them to the trace so you can\nnotice why your function is slow.\n  Using root permissions to record every process on every core plus the\nkernel and combining it with task switch information to visualize\nliterally everything the processor was doing during a time slice so\nmysterious performance problems have nowhere left to hide.\n\n\nIf more people use Processor Trace, more VM hypervisors will support\nit, Intel will hopefully invest more in it as a differentiating factor,\nand perhaps AMD will try to catch up. And if you want to use Processor\nTrace today, try magic-trace on your own problems, or apply to Jane\nStreet and come try it on ours: we’re always hiring!\n",
        "url"      : "https://blog.janestreet.com/magic-trace/",
        "image"    : "https://blog.janestreet.com/magic-trace/magic-trace-blog-image.jpg",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Hiring a Developer Educator",
        "date"     : "October 21, 2021",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 1,
        "content"  : "We spend a lot of time on education at Jane Street.  Like, really a\nlot.\n\nWe have an OCaml Bootcamp for new traders, classes for new non-traders\nto learn about trading, a class to teach traders and researchers how\nto use Python effectively, a cycle of more advanced classes for\nfull-time devs in their first year, classes aimed at interns, classes\nfor potential hires. The list goes on.\n\nAll of that teaching is done by people here as a kind of add-on to\ntheir ordinary work.  What we don’t have is someone who comes in with\ndeep experience as an educator, and whose career is focused on\neducation.  And that’s the spot we’re looking to fill.\n\nWe’re still figuring out how the role should work, but there are some\nthings we know.\n\n\n  \n    We want someone who is a skilled, effective, and experienced\nteacher.\n  \n  \n    The job demands an understanding of our technology stack, which\nmeans that that person needs to spend a material slice of their time\nworking with it.  That means we want someone who is good at and\nexcited about building software, and happy to spend a significant\nfraction of the time writing code.\n  \n  \n    Part of the job is going to be directly teaching and developing\ncurricula.  But an equally important part is helping other people\ninvolved in teaching grow their skills, so we want someone who is\ngood at mentoring other budding teachers.\n  \n\n\nWe’re excited to talk to people who have experience teaching CS topics\nin a university setting, as well as people who have spent time\nteaching technical topics in industrial settings.\n\nYou can apply\nhere,\nwhere you can see the official job description as well.\n",
        "url"      : "https://blog.janestreet.com/hiring-a-developer-educator/",
        "image"    : "https://blog.janestreet.com/hiring-a-developer-educator/teaching-blog.jpg",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Goodbye Core_kernel",
        "date"     : "August 26, 2021",
        "authorId" : "jdimino",
        "author"   : "Jeremie Dimino",
        "tags"     : ["core"],
        "minsToRead" : 3,
        "content"  : "We recently restructured our standard libraries at Jane Street in a\nway that eliminates the difference between Core_kernel and Core\nand we’re happy with the result. The new layout should reach the open\nsource world before the end of the year.\n\nOverview\n\nFor many years, Jane Street has had two variants of our Core\nstandard library:\n\n\n  \n    Core_kernel, which extends Base, and works on Javascript and\nWindows.\n  \n  \n    Core, which extends Core_kernel, adding functionality that uses\nthe Unix library.\n  \n\n\nWe eliminated the distinction between Core and Core_kernel by\nmoving (mostly non-portable) functionality from Core out to\nstandalone libraries, and then simply renamed Core_kernel as Core.\n\nRationale\n\nThe upsides of this change are clear enough:\n\n\n  \n    It’s simpler. Instead of three different flavors of\nour standard library, there are just two, hopefully with clearer\npurposes.\n  \n  \n    It encourages portability. The fact that Core wouldn’t work on\nJavaScript and Windows meant that lots of libraries that could have\nbeen portable weren’t. This change should fix that.\n  \n\n\nThis change is the culmination of work that has been going on for\nyears.  We had already moved most of the functionality out of Core,\ninto either Core_kernel or to standalone libraries that depended on\nCore.  In the end, there wasn’t that much code that distinguished\nCore from Core_kernel, so the change is smaller than it seems.\n\nHopefully, this all makes the differences between Base and Core\nclearer.  In particular:\n\n\n  Base is lighter, much faster to compile, and more stable.\n  Core is more extensive, having both more useful libraries and\ndata-structures, as well as some more useful functionality broadly\nintegrated into it, like pervasive integration of bin_prot and\nbase_quickcheck.\n\n\nAnd both of them are fully portable.\n\nAdapting code to the change\n\nThe recommended way to adapt code to this change is to replace\nCore_kernel with Core and then to build with deprecation errors\nenabled.  We have left behind @@deprecated annotations throughout\nCore for functionality that was moved into standalone libraries.\nSo, you should get deprecation messages if your code needs further\nupdates.\n\nFor example, if you use Core.Date.format, you will get a deprecation\nmessage:\n\nError (alert deprecated): Core.Date.format\n[since 2021-03] Use [Date_unix]\n\n\n\nHere, Date_unix is the name of the new standalone library that\nextends Core.Date with Unix-dependent functionality, include\nDate_unix.format.\n\nHere’s a list of the modules that moved from Core to a standalone\nlibrary:\n\n\n  \n    \n      Old name\n      New name\n    \n  \n  \n    \n      Core.Command\n      Command_unix\n    \n    \n      Core.Date\n      Date_unix\n    \n    \n      Core.Filename\n      Filename_unix\n    \n    \n      Core.Signal\n      Signal_unix\n    \n    \n      Core.Sys\n      Sys_unix\n    \n    \n      Core.Thread\n      Core_thread\n    \n    \n      Core.Time\n      Time_unix\n    \n    \n      Core.Time_ns\n      Time_ns_unix\n    \n    \n      Core.Unix\n      Core_unix\n    \n  \n\n\nReleasing this change in the main opam repository\n\nThis change is good but is not backward compatible. As a result, some\ncare will be needed to release it into the opam repository. We have a\nfew ideas for how to make the transition smoother, but we haven’t\nsettled on a concrete plan yet. We plan to work more proactively on it\nstarting from September.\n",
        "url"      : "https://blog.janestreet.com/goodbye-Core_kernel/",
        "image"    : "https://blog.janestreet.com/goodbye-Core_kernel/core_kernel.png",
        "topic"    :  ["technology","core"] ,
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "What the interns have wrought, 2021 edition",
        "date"     : "August 9, 2021",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : ["internship"],
        "minsToRead" : 14,
        "content"  : "It’s the end of another dev internship season, and this one marked\nsomething of a transition, since halfway through the season, NY-based\ninterns were invited back to the recently reinvigorated office.  Which\nmeans that many more of us got the chance to meet and hang out with\nthe interns in person than we did last year.  And hopefully the\ninterns were able to get a better sense of Jane Street and how it\noperates.\n\nRemote or not, the internship is bigger than ever. We had 87 dev\ninterns between New York, London, and Hong Kong from 37 different\nschools and 21 different countries. As usual, there are way too many\ninteresting projects to describe them all. So I’ve picked just three\nprojects to describe in more depth. In particular:\n\n\n  Jose Rodriguez extended the functionality of a neat new syntax\nextension called ppx_typed_fields.\n  Erin Vuong built tools to make it easier for people to dig into\nhistorical information about the processes running on their boxes.\n  Ohad Rau worked on improving and using a new-ish library called\nDatafetcher, which helps you write easily testable data processing\njobs.\n\n\nIt’s worth saying that this year was an especially hard choice.  Here\nare just some of the projects I considered but chose not to write\nabout:\n\n\n  A visualizer for Intel’s processor trace tech\n  implementing graph-node fusion in an internal graph-computation\nsystem\n  implementing a distributed sorting algorithm on top of that same\nsystem\n  building a new meta-PPX to make it way easier to build new syntax\nextensions\n  providing better abstractions for specifying trading limits\n\n\nAnd the list goes on.\n\nAnyway, let’s look at the three projects we’re actually going to\ndiscuss in more detail.  Remember that each one of these just\nrepresents half of what one intern got done this summer!\n\nTyped Fields\n\nJose worked with the Tools and Compilers team on adding functionality\nto a fairly new syntax extension called ppx_typed_fields.\n\nFirst, a bit of background: a syntax extension is basically a way of\nadding certain types of language features by writing code that\ntransforms the syntax of your program, auto-generating new code for\nyou to use.  Jane Street uses a lot of syntax extensions to automate\nthe generation of boring and repetitive code: things like comparison\nfunctions, hash\nfunctions, serializers and\ndeserializers, and so\non.\n\nppx_typed_fields provides some extra functionality around working\nwith records.  When you define a new record, OCaml by default provides\nyou with a few basic tools.  So if you write:\n\ntype t = { foo: int; bar: string }\n\n\n\nyou now have syntax for constructing the record, for pattern matching\non it, and for accessing fields individually.\n\n# let r = { foo = 3; bar = \"tomato\" }\nval r : t = {foo = 3; bar = \"tomato\"}\n# r.foo\n- : int = 3\n# let { foo; bar } = r in sprintf \"%d %s\" foo bar\n- : string = \"3 tomato\"\n\n\n\nThat’s nice enough, but it’s missing something useful: a first-class\npiece of data that represents a field.  Such a value would let you do\nthe ordinary things you might do with a record field, like update the\nfield, extract its value, or compute the field’s name.  But because\nthey’re first class, you can use them in a much more general context:\nyou can pass them to a function, serialize them to disk, build custom\niterators over them, etc.\n\nPpx_typed_fields fills this gap.  Here’s what we’d write to redefine\nour type, but this time, deriving the Typed_fields.\n\ntype t = { foo: int; bar: string } [@@deriving typed_fields]\n\n\n\nThis generates some first class values that can be used for doing\nthings like reading values out of records, or doing a functional\nupdate to a record.\n\n# Typed_field.Foo\n- : int Typed_field.t = Typed_field.Foo\n# Typed_field.get Foo { foo = 5; bar = \"banana\" }\n- : int = 5\n# Typed_field.set Bar { foo = 5; bar = \"banana\" } \"asparagus\"\n- : t = {foo = 5; bar = \"asparagus\"}\n\n\n\nOne neat thing you can do is write field validators, and you can use\nthe fact that Typed_field knows the name of the field to generate a\nuseful error message.  Here’s the validator:\n\nlet validate_field (type a)\n      field (module M : Identifiable with type t = a) (lo,hi) record =\n   let v = Typed_field.get field record in\n   if M.(v &lt;= hi && v &gt;= lo) then Ok ()\n   else Error (sprintf \"Field %s out of bounds: %s is not in [%s,%s]\"\n     (Typed_field.name field) (M.to_string v)\n     (M.to_string lo) (M.to_string hi))\n\n\n\nAnd here’s how you’d use it:\n\n# let r = { foo = 3; bar = \"potato\" }\nval r : t = {foo = 3; bar = \"potato\"}\n# validate_field Foo (module Int) (5, 10) r;\n- : (unit, string) result =\nError \"Field foo out of bounds: 3 is not in [5,10]\"\n# validate_field Bar (module String) (\"apple\", \"banana\") r;\n- : (unit, string) result =\nError \"Field bar out of bounds: potato is not in [apple,banana]\"\n\n\n\nAnd that just scratches the surface of what you can do with typed\nfields.\n\nAll of this was in place by the time Jose started. But there were some\nkey missing features:\n\n\n  Typed fields didn’t work on all kinds of records, in particular,\nrecords with type parameters.\n  Jose added a new variant on the PPX that handles records inside of\nrecords, creating values that let you reference values multiple\nlevels deep in a nested data structure.\n\n\nThat’s all that was initially planned for the project, but Jose\nfinished up quickly enough that he had time for more, so he went ahead\nand added a typed-variants extension which does the analogous thing\nfor variant types.\n\nThe project itself was technically pretty challenging. First of all,\ntyped fields are an example of\nGADTs\n(Generalized Algebraic Data Types), which are an advanced feature of\nthe language that takes a bit of time to learn. In addition, Jose had\nto come to grips with Ppxlib,\nand had to generalize his code to cover a bunch of tricky special\ncases in OCaml’s menagerie of type definitions.\n\nppx_typed_fields is already pretty successful internally, having\nbeen picked up by a half-dozen different projects, and Jose’s\nextensions make it more widely applicable.  It’s also already\navailable as part of our open source\nrelease.\n\nHistorical atop\n\nErin Vuong’s project was all about observability.  One of the tools we\nuse for gathering data on our systems is\natop, which is a system-monitoring tool\nthat computes a lot of useful statistics, and gives you a way of\nlogging them.  The great thing about atop versus various other\nobservability tools we have is that it gives you detailed,\nprocess-by-process information about your system.\n\nWe log data from atop every ten seconds for every production box on\nour network, and we retain the last 30 days of logs in a central\nlocation. But our tooling around using this data is a little limited.\nYou can use the native atop user interface for rewinding to a given\ndate and time and seeing the state of atop at that moment.  We’d also\nbuilt a web-service that lets you get at the same data, but it’s also\npretty limited, and in particular, only lets you browse data for a\nsingle day.  And if you wanted to do something else, well, there’s\nalways awk and sed…\n\nThat’s where Erin’s project came in. Her job was to surface\ntime-series data of the behavior of a particular system or process,\nand to be able to pull that data up across different days.  The end\nresult was simple enough: the data just gets uploaded to a\nspreadsheet for users to slice and dice and analyze to their heart’s\ncontent.  And the new functionality needed to be surfaced through two\ndifferent user-interfaces: the command line tools and the web-ui.\n\nHere’s what the web-ui looks like.  There’s not much there, and that’s\nkind of the point! The goal was to make it really simple for users.\n\n\n\nIndeed, part of the challenge here was in designing the various UIs to\nbe easy to use for novice users, while remaining powerful enough for\nadvanced users to get just the data they wanted.  So, for example,\nErin developed a filtering library that allowed filtering over a\nfamiliar regular expression, as well as a complex nested boolean\nexpression.  She also developed separate basic and expert commands, so\nusers wouldn’t be exposed to the extra complexity unless they needed\nit.\n\nHere’s what the basic interface looks like:\n\n\n\nAnd here’s one using a boolean expression:\n\n\n\nThe other challenge of this project was just discovering and resolving\na bunch of complex edge cases: making sure we didn’t break our\nspreadsheet by uploading too much data to it, making sure to do the\nwork incrementally, so you wouldn’t redownload atop data you already\nhad, etc.\n\nMemoization in Datafetcher\n\nOne of the things that trading desks need to do is to download various\nkinds of input data from a variety of sources in order to put together\ntheir trading data for the day.  This can be surprisingly tricky and\ntime consuming, and it’s frustratingly hard to test.  That’s because\nthe work you do depends on a lot of external sources of data, and to\ntest it effectively, you need access to all that data.\n\nDatafetcher is a system that was developed on one of our trading desks\nfor making this kind of thing easier.  It’s loosely modeled on\nFacebook’s\nHaxl,\nand it provides you with a little language embedded in OCaml for\nexpressing data-fetching-and-processing jobs.  The leaf-nodes in the\ncomputation are some kind of “fetch”, an instruction for grabbing\nexternal data; and interior nodes in the computation can transform the\ndata, and even guide what data you want to fetch farther down the\npipeline.\n\nHere’s what a simple datafetcher program for computing the\nnet-asset-value (NAV) of an ETF might look like.\n\nlet get_price component ~date =\n  let open Datafetcher.Let_syntax in\n  match%bind Product_type.of_symbol component with\n  | Equity -&gt; Equity_prices.get_closing_price component ~date\n  | Future -&gt; Future_prices.get_settlement_price component ~date\n\nlet get_nav fund ~date =\n  let open Datafetcher.Let_syntax in\n  let%bind currency = Fx.get_currency fund\n  and component_values =\n    let%bind { basket; _ } = Composition.load_latest fund ~on_or_before:date in\n    Datafetcher.List.map (Map.to_alist basket) ~f:(fun (component, weight) -&gt;\n      let%map price = get_price component ~date in\n      Money.scale weight price)\n  in\n  Fx.add_in_given_currency component_values ~currency\n\n\n\nThe key win of Datafetcher is that it makes testing massively easier.\nBecause of how Datafetcher is structured, it can be run in multiple\nmodes:\n\n\n  \n    In production mode, fetches do the “real” work, kicking off whatever\nsequence of network requests is required to grab the data in\nquestion.\n  \n  \n    In test mode, jobs just read a version of the data that has been\npreviously stored on disk.\n  \n  \n    In record mode, it essentially runs a production job, but records\nthe data you get for use in tests.\n  \n\n\nThe end result is that testing your Datafetcher jobs is incredibly\ncheap.\n\nThat’s not all Datafetcher does for you. It can also batch requests\ntogether, which makes your data requests more efficient.  And it can\ncache requests, so if the same data is asked for multiple times in the\nsame job, it only needs to be fetched once.\n\nBut there’s a missing bit of functionality in Datafetcher, and that’s\nwhere Ohad Rau’s intern project came in.  While Datafetcher can cache\nexternal fetches, it didn’t cache computations built on top of those\nfetches. Which means that if your job runs the same computation over\nand over, then that’s going to be really inefficient.\n\nThis was discovered in the course of the desk migrating existing jobs\nover to Datafetcher, and Ohad’s job was to fix this caching problem.\nThis was tricky, because it required Ohad to mint an identifier for\neach subcomputation, so you could effectively detect that the\ncomputation had already been done in this run.\n\nThe end result was that instead of just caching computations at the\nleaves, Datafetcher now can memoize computations at any intermediate\nexpression.\n\nOhad finished up the memoization work in the first couple of weeks of\nhis internship, and then turned to converting an existing desk app to\nuse Datafetcher, and when he was done with that, writing a style guide\nfor Datafetcher to give traders who are using the system a sense of\nwhat reasonable best practices are.\n\nSound like fun?\n\nAs usual, my hope here is to give you a sense of the kinds of things\nyou might do here if you were an intern. And really, there are way\nmore interesting, challenging, fun projects here than I had a chance\nto talk about.\n\nSo, if it sounds like something you’d be interested in,\napply!  The\ninternship program is a really great opportunity to learn a lot both\nabout software and about the world of trading. And you can learn more\nabout our interview process\nhere.\n",
        "url"      : "https://blog.janestreet.com/what-the-interns-have-wrought-2021/",
        "image"    : "https://blog.janestreet.com/what-the-interns-have-wrought-2021/internswrought_2021.jpg",
        "topic"    :  ["technology","internship"] ,
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Looking for a developer experience engineer",
        "date"     : "June 15, 2021",
        "authorId" : "toverby",
        "author"   : "Ty Overby",
        "tags"     : [],
        "minsToRead" : 1,
        "content"  : "This role has been filled\n\nThe Jane Street Tools & Compilers team is looking to hire a developer\nwho will act as the primary contact point with users of our tools\nthroughout the firm.\n\nWe’re strong believers in the power of good tools, and as the team and\nour ambitions grow, we’d like someone to act as an advocate both\nfor our tools and for Jane Street’s developers, making sure that\nwe understand our developers’ needs and prioritize accordingly.\n\nPart of your job would be to act as a tools developer,\ncontributing to a wide variety of tools, including text editor\nplugins[1][2]\n, our code-review system\nIron,\nand the Dune build system.  All in, we expect\nyou’ll spend about half your time as a developer.\n\nWe’d also want you to spend time learning about development processes\nacross the firm; this will involve rotating through different teams\nin order to understand their challenges and help them onboard new\ntools and processes.\n\nYou’d also spend time teaching people how to use our tools,\nwhether it’s by participating in Jane Street’s 4-week Bootcamp\nprogram,\nwriting blog posts and tutorials, giving internal talks, or building\nand maintaining demo programs.\n\nFinally, you’d have a role in helping us plan and prioritize our\nwork.  One of the hardest thing the Tools and Compilers team has to do\nis to figure out what of the many possible improvements we could make\nis worth doing.  A big part of your job would be to help us understand\nthe needs of Jane Street developers, and use that knowledge to take\npart in the planning process.\n\nWe’re looking for someone with a unique mix of qualities:\n\n\n  A love of the craft of software engineering, and a belief in the power of\ngreat tools.\n  Skill as a writer.\n  Strong relationship building and communication skills! A lot of the\njob is about reaching out to people you don’t know.\n\n\nMost of all, you need to derive real satisfaction from helping other people,\nsince that’s the core of the job.\n\nThis role has been filled\n",
        "url"      : "https://blog.janestreet.com/looking-for-a-developer-experience-engineer-index/",
        "image"    : "https://blog.janestreet.com/looking-for-a-developer-experience-engineer-index/generic_tech.jpg",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Growing the Hardcaml toolset",
        "date"     : "December 1, 2020",
        "authorId" : "aray",
        "author"   : "Andrew Ray",
        "tags"     : [],
        "minsToRead" : 2,
        "content"  : "I am pleased to announce that we have recently released a slew of new\nHardcaml libraries!\n\nWhile I think the Hardcaml version now maintained at Jane Street is of\nhigher quality than the old open-source release, it has lacked the\nbreadth of tools that I initially supported. I think we can now claim\nto support a superset of the old functionality along with a much\nbetter testing story for these libraries.\n\nIn addition we have also been busy beavering away on some\nHardcaml documentation.\nRead an introduction\nto the libraries, work though an\nexample design,\nor fork a complete design\nthat takes you all the way from OCaml to an FPGA bitstream.\n\nThe documentation comes in the form of MDX files which are basically\nexecutable markdown files. The nice thing about this approach is our\ndocumentation will stop compiling if we change APIs and don’t properly\nupdate it.\n\nSo, whats new?\n\n\n\nhardcaml_circuits\nprovides a bunch of useful and/or interesting circuit designs. Use\nthem for real, or just to learn how to design and test hardware in\nHardcaml.  Choose from arbiters, high-speed multiplier architectures,\nsorting networks and much more!\n\nhardcaml_xilinx_components\nprovides an executable that can read the Xilinx Unisims library and\ngenerate OCaml modules for any one of thousands of basic FPGA\ncomponents.  The components can be instantiated in Hardcaml circuits\nin a friendly and type-safe manner.\n\nhardcaml_xilinx\nwraps up some Xilinx-specific FPGA components. A major focus is on\nwrapping RAM and FIFO primitives and providing models suitable for\nHardcaml simulations. An interesting feature implements the core\nHardcaml Comb.S module type using only FPGA LUT primitives. If you\nwant to do some really low level design stuff on an FPGA, you should\ntake a look.\n\nhardcaml_verify is\na set of verification tools, mainly centred around proving circuit\nproperties using SAT. And solving Sudoku of course!\n\nhardcaml_verilator\nand hardcaml_c are a\npair of high speed simulator backends with interfaces compatible with\nthe standard Hardcaml Cyclesim simulator.  These trade compilation\ntime for simulation performance.\n\nhardcaml_of_verilog\nloads Verilog code into Hardcaml!  Under the hood, it uses\nyosys (0.9) to convert verilog to\njson, then provides tools to convert that json into a Hardcaml\ncircuit.\n\nhardcaml_step_testbench\nis a monadic interface for driving Hardcaml simulations.  It provides\na notion of a spawned task which can perform some operation in\nparallel with other tasks. Everything synchronises on a clock step.\n\nhardcaml_fixed_point\nis a simple hardware fixed point datatype with all the usual\narithmetic operations.  It provides a rich set of rounding and\nclamping modes, and separate signed and unsigned types.\n\nUsing the libraries\n\nAll these libraries will become available in the mainline opam\nrepository along with the v0.15 release of the Jane Street packages.\n\nopam install hardcaml_circuits\n\n\n\nThe very latest versions are regularly released into the Jane Street\nbleeding edge opam\nrepository.  You can add this repository with\n\nopam repo add janestreet-bleeding https://ocaml.janestreet.com/opam-repository\n\n\n\nAnd then opam install ... the latest packages as normal.\n",
        "url"      : "https://blog.janestreet.com/growing-the-hardcaml-toolset-index/",
        "image"    : "https://blog.janestreet.com/growing-the-hardcaml-toolset-index/Hardcaml_blog_image_scaled.png",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Announcing Our Market Prediction Kaggle Competition",
        "date"     : "November 24, 2020",
        "authorId" : "cfalls",
        "author"   : "Craig Falls",
        "tags"     : [],
        "minsToRead" : 0,
        "content"  : "Jane Street is running a Kaggle contest based on a real problem with\nreal financial data. If you like ML projects, or think you might,\nhead over and check it\nout.\nWe think it’s a pretty fun one. The prizes are pretty good too, with a\ntotal $100K being paid out.\n\nPrimarily our goal is to expose a new audience to some of the kinds of\nproblems that we face, in hopes that it will encourage them to apply.\nData science is a huge part of what we do, so the Kaggle community was\na pretty obvious one to reach out to. As usual, we’re hiring for\nbasically all roles in New York, London, and Hong Kong, so if you’ve\nbeen thinking about it, there’s no time like the\npresent.\n\nWe’re also interested to see what kind of approaches the Kagglers take\nand maybe even draw some inspiration from their submissions. If\nanything particularly interesting comes up, we’ll definitely write it\nup here.\n\nAll the details of the contest are on Kaggle’s\nsite.\n",
        "url"      : "https://blog.janestreet.com/announcing-our-market-prediction-kaggle-competition-index/",
        "image"    : "https://blog.janestreet.com/announcing-our-market-prediction-kaggle-competition-index/kaggle_blogpost.jpg",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Finding memory leaks with Memtrace",
        "date"     : "October 6, 2020",
        "authorId" : "lmaurer",
        "author"   : "Luke Maurer",
        "tags"     : [],
        "minsToRead" : 14,
        "content"  : "Memory issues can be hard to track down. A function that only\nallocates a few small objects can cause a space leak if it’s called\noften enough and those objects are never collected. Even then, many\nobjects are supposed to be long-lived. How can a tool, armed with data\non allocations and their lifetimes,\nhelp sort out the expected from the suspicious?\n\nThe Memtrace library and viewer are Jane Street’s new profiling\ntools, aiming to answer this question, and we’re excited to be\nreleasing them publically. We’ve been using them internally for a few\nmonths now, and we’re finding we can often find a space leak using\njust some basic data visualization and the human ability to say “Hang\non, what’s that?”\n\nMemtrace is built on top of the excellent new support for statistical\nmemory profiling that was added to the OCaml runtime in OCaml\n4.11. Special thanks to Jacques-Henri Jourdan at CNRS and our own\nStephen Dolan for all their great work on adding that support.\n\nThe new profiling support allows a program to get callbacks on garbage\ncollection events for a sample of the program’s allocations. The\nMemtrace library uses these callbacks to record allocation events in a\ncompact binary format. The Memtrace viewer then analyses the events\nand presents graphical views, as well as filters for interactively\nnarrowing the view until the source of the memory problem becomes\nclear.\n\nAs discussed on this blog,\nOCaml already had support for profiling memory using its Spacetime\nframework. Spacetime would gather data on all the allocations\nperformed by a program. It was very useful for finding memory leaks\nbut had a high overhead and required using a special configuration of\nthe OCaml compiler. Its profiles could become very large and require\nhuge amounts of memory to process.\n\nIn comparison, the new profiling support:\n\n  profiles a sample of the allocations, not all of them\n  works with the ordinary OCaml compiler\n  is supported by all platforms\n  can run with low enough overhead to be used in production systems\n\n\nGenerating a trace\n\nThere is no special compiler or runtime configuration needed to\ncollect a trace of your program. You need only link in the memtrace\nlibrary and add a line or two of code. The library is available in\nOPAM, so you can install it by running\n$ opam install memtrace\n\n\nand link it into your application by adding it to your dune file:\n(executable\n (name my_program)\n (libraries base foo bar memtrace))\n\n\n\nNow you need only add the code to create a trace. For most applications,\nit will suffice to add a call to Memtrace.trace_if_requested\nwherever you want to start tracing. Typically this will be right at\nthe beginning of your startup code:\n\nlet _ =\n  Memtrace.trace_if_requested (); (* &lt;-- new line *)\n  let cmd = process_command_line () in\n\n\n\nThis will check whether the environment variable MEMTRACE is set; if\nso, it stores the trace as $MEMTRACE, using the default sample rate\n(or $MEMTRACE_RATE if that’s also set), and continues tracing until\nthe program finishes. Finer control is available; see the Memtrace\nmodule for details.\n\nIf your program daemonizes by forking at startup, make sure that\ntrace_if_requested is called after forking, so that you trace the\nprocess that does the work rather than the one that exits quickly.\n\nNow simply run the program as usual but with MEMTRACE set:\n\n$ MEMTRACE=trace.ctf ./main.exe\n$ ls trace.ctf\ntrace.ctf\n$\n\n\nRunning the viewer\n\nFor the examples in this tutorial, we’ll use this trace of a build\nof the js_of_ocaml compiler which we have subtly altered to induce a\nspace leak.\n\n$ opam install memtrace_viewer\n$ memtrace-viewer js_of_ocaml-leaky.ctf\nProcessing js_of_ocaml-leaky.ctf...\nServing http://your-hostname:8080/\n\n\nUse -port to listen on a port other than the default 8080.\n\nOpen the URL to see the main Memtrace viewer interface:\n\n\n\nIn the middle of the screen is the flame graph. Each function that\nallocates appears at the top, and underneath each function we see its\ncallers, sized according to which callers caused the most allocations.\nNote the difference from the traditional flame graph, which subdivides\nupward into each function’s callees (our variation is often called an\nicicle graph). Any caller that would account for less than 0.1% of\ntotal allocations is dropped from the graph.\n\nIn the top-right corner is a graph of memory usage (specifically, the\ntotal live words in the OCaml heap) over time. What the totals can’t\ntell us, however, is when the allocations contributing to the peak\noccurred. Perhaps each peak consists entirely of new objects, or\nperhaps objects from much earlier are accumulating. To tease this out,\nwe can narrow our view to those allocations whose objects remain live\nat the end of the trace, or wherever the peak happens to be. Here, I\nset Only show allocations: Live somewhere between to 11 s and 12\ns, and click Apply:\n\n\n\nWe can see that there is a steady stream of allocations the entire\ntime, though the flame graph doesn’t point to any particular culprit.\nLooking at the line graph again, however, we now see a blue line for\nthe total allocations which pass the filter. Keeping in mind that\nwe’re looking at a compiler, we can distinguish three phases:\n\n\n  Some initial work that probably includes fixed startup costs, as\nwell as things like identifiers that are allocated during parsing\nand may still be relevant at the end.\n  A gradual accumulation that continues throughout processing.\n  A final set of allocations that could easily be the construction of\nthe target code just before output.\n\n\nPhases 1 and 3 have innocent explanations, but phase 2 seems\nsuspicious. Therefore let’s narrow our filter again, looking at just\nthe allocations that take place in the middle and remain live at the\nend. Setting Only show allocations: Occurring between to 3 s and 7\ns, we see:\n\n\n\nNow it’s quite obvious that the traverse function for directed graphs\nis allocating a large amount via a call to Array.init. Importantly,\na traversal function isn’t something we expect to be\nallocating much memory, certainly not permanently. And one look at the\ncode for traverse reveals our mischief:\n\n  let rec traverse g to_visit stack x =\n    wasted_memory := Array.init 5 ~f:(fun _ -&gt; 42) :: !wasted_memory;\n    if NSet.mem to_visit x\n\n\n\nOne can only wish memory bugs were always that obvious.\n\nOther viewer features\n\nTable\n\nOften the flame graph is too noisy if you just want to see the top\nallocators. Click the Table tab for a simpler summary view:\n\n\n\nSelect a row and press the → key to subdivide a function by its\ncallers (much as the flame graph does for all functions).\n\nDirection\n\nBy default, the flame graph and the table focus on allocation sites,\nwhich is good for finding memory leaks when they are caused by\nlocalized problems. For a broader view, you can switch to a\ntraditional flame graph by selecting Explore: Upwards from “main”\nfrom the Filter panel and clicking Apply.\n\n\n\nZooming\n\nNodes in the flame graph can easily get too narrow to read, and the\ntable only shows the allocation sites (or top-level code, when\nexploring upwards) by default. In either view, you can select a\nfunction and click Zoom In in the Zoom panel. In the flame graph,\nthis redraws the graph starting from the selected node:\n\n\n\nIn the table view, this shows the callers or callees of the zoomed\nfunction (depending on the direction):\n\n\n\nYou can click Zoom Out to remove one level of zoom, or Reset\nZoom to remove all zoom.\n\nWriting a custom analysis\n\nOf course, memory bugs can be subtler than the one we’ve been looking\nat, and no viewer can implement every possible analysis you might want\nto perform. Fortunately, the Memtrace library itself offers a simple\nyet powerful way to slice the data any way you can think of.\n\nLet’s have a look at another trace, where we’ve induced a different\nbug in js_of_ocaml:\n\n\n\nWe can start by again looking at what’s live at the end of the program,\nby using Live somewhere between to select a range:\n\n\n\nWe can see that most of the memory live at the end of the program was\nallocated recently. If there were a straightforward memory leak, we’d\nsee lots of allocations from throughout the program that were live at\nthe end, so we can rule that out.\n\nNext, let’s look at somewhere in the middle of the program. If we set\nLive somewhere between to 14–27.4, and Occurring between to 0–14,\nthen we’ll see a snapshot of the heap at time t=14, showing all of\nthe values that were allocated before that time and live after it:\n\n\n\nThis is a much more suspicious-looking graph. Notice the steep\nallocation rate in the blue line before t=14, showing that values live\nat t=14 were being continuously allocated for about four seconds.\nAll of these values are eventually freed (as shown by the stepped blue\nline after t=14), but you can see that this line is less steep.\n\nThis means that new values are being created faster than they are\nbeing consumed. This is a common problem in systems that use a queue\nto connect components: if the producer outpaces the consumer and no\nbackpressure is applied, then the queue can balloon to larger and\nlarger sizes.\n\nTo get a hint as to what sort of values are being created, we can look\nat the flame graph. Most of the allocations here are indeed being\ncreated by Stdlib.Queue.add, called by Code.queue_operations\n(mouse over the node in the flame graph to see the full name), which\nalso points to a ballooning queue.\n\n\n\nTo analyse the queueing performance of our modified js_of_ocaml,\nit’s useful to see how the distribution of queued objects’ lifetimes\nevolves over time. Queues often contains values that are discarded\nshortly after they’re consumed, so the lifetime of a queued item is a\nreasonably good proxy for how long it spent queued.\n\nUnfortunately, the Memtrace viewer doesn’t (yet!) have a way to\nanalyse lifetimes like this. Here’s where writing a custom analysis\nwill be useful. Let’s focus on allocations\nthat happen somewhere inside Code.queue_operations. We can detect\nthese by scanning each allocation’s backtrace for that function name,\nand measure its lifetime by subtracting the allocation timestamp from\nthe collection timestamp.\n\nWe need to specially handle allocations that are never collected.\nHere, let’s count them as being collected at the time the program\nterminates.\n\nThe complete analysis program is:\nopen Memtrace.Trace\nopen Base\nopen Stdio\n\nlet print_alloc alloc_time collect_time =\n  let alloc_time = Timedelta.to_int64 alloc_time in\n  let collect_time = Timedelta.to_int64 collect_time in\n  printf\n    \"%f %f\\n\"\n    (Float.of_int64 alloc_time /. 1_000_000.)\n    (Float.of_int64 Int64.(collect_time - alloc_time) /. 1_000_000.)\n;;\n\nlet () =\n  let trace = Reader.open_ ~filename:(Sys.get_argv ()).(1) in\n  let allocs = Obj_id.Tbl.create 20 in\n  let last_timestamp = ref None in\n  Reader.iter trace (fun time ev -&gt;\n    last_timestamp := Some time;\n    match ev with\n    | Alloc alloc -&gt;\n      if Array.existsi alloc.backtrace_buffer ~f:(fun i loc -&gt;\n        i &lt; alloc.backtrace_length\n        && List.exists (Reader.lookup_location_code trace loc) ~f:(fun loc -&gt;\n          String.equal\n            loc.defname\n            \"Js_of_ocaml_compiler__Code.queue_operations\"))\n      then Obj_id.Tbl.add allocs alloc.obj_id time\n    | Collect id when Obj_id.Tbl.mem allocs id -&gt;\n      print_alloc (Obj_id.Tbl.find allocs id) time;\n      Obj_id.Tbl.remove allocs id\n    | _ -&gt; ());\n  allocs\n  |&gt; Obj_id.Tbl.iter (fun _ alloc_time -&gt;\n    print_alloc alloc_time (Option.value_exn !last_timestamp));\n  Reader.close trace\n;;\n\n\n\nThis could be made faster: for instance,\nscanning the whole backtrace and comparing strings on each frame is\nnot efficient. But running it on our example trace takes about 25 ms,\nso let’s not worry too much.\n\nThis outputs a series of (allocation time) (lifetime) pairs, with times\nin seconds. Memtrace internally keeps microsecond-precision\ntimestamps for everything, which is why the numbers are divided by\n1_000_000. before printing.\n\nTo compile our analysis, you can use Dune with this simple dune file:\n\n(executable\n (name lifetimes_of_queued_objects)\n (libraries base stdio memtrace))\n\n\n\nWe can view the results by piping into gnuplot:\n\n$ dune build\nDone: 29/29 (jobs: 1)\n$ _build/default/lifetimes_of_queued_objects.exe js_of_ocaml-queue.ctf | gnuplot -p -e \"plot '-'\"\n\n\n\n\n\nThe x-axis is time since program start, and the y-axis is allocation\nlifetime, both in seconds.\n\nThere are lots of downward strokes on this graph. This is an artifact\nof how GC works: many values allocated over a period of several\nseconds are collected all at once, giving the ones allocated later a\nslightly shorter lifetime. This effect is particularly pronounced at\nthe end, where we deemed all of the uncollected values to be collected\nright as the program ended.\n\nNonetheless, we can clearly see a backlogged queue here. The time\ntaken to process a queue item starts low, but grows steadily over time.\nAt the peak (t = 15 or so), items are taking over 12 seconds to be\nprocessed.\n\nThe future\n\nThis is still a new project. Features we’re working on for the\nMemtrace viewer include more sophisticated filtering, including\nfiltering by module, and the ability to operate on data streamed from\na live process. We’re also eager to see what other analyses and\nvisualizations people come up with using the Memtrace library.\n\nIf you have any problems with the library or the viewer, or if you\nhave an idea for how to make them better, let us know by filing an\nissue on GitHub (library, viewer).\n\n",
        "url"      : "https://blog.janestreet.com/finding-memory-leaks-with-memtrace/",
        "image"    : "https://blog.janestreet.com/finding-memory-leaks-with-memtrace/memory-leak.jpg",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Memory allocator showdown",
        "date"     : "September 15, 2020",
        "authorId" : "sdolan",
        "author"   : "Stephen Dolan",
        "tags"     : [],
        "minsToRead" : 7,
        "content"  : "(Image by Theresa Bloise)\n\nSince version 4.10, OCaml offers a new best-fit memory allocator\nalongside its existing default, the next-fit allocator. At Jane\nStreet, we’ve seen a big improvement after switching over to the new\nallocator.\n\nThis post isn’t about how the new allocator works. For that, the best\nsource is these notes from a talk by its\nauthor.\n\nInstead, this post is about just how tricky it is to compare two\nallocators in a reasonable way, especially for a garbage-collected\nsystem.\n\nBenchmarks\n\nOne of the benchmarks we looked at actually slowed down when we switched\nallocator, going from this (next-fit):\n$ time ./bench.exe 50000\nreal    0m34.282s\n\n\n\nto this (best-fit):\n$ time ./bench.exe 50000\nreal    0m36.115s\n\n\n\nBut that’s not the whole story. The best-fit memory allocator reduces\nfragmentation, packing allocations together more tightly. If we\nmeasure both time spent and memory used, we see there’s a trade-off\nhere, with best-fit running slightly slower but using less memory:\n\n\n\nBut that’s not the whole story either. It would be, in a language with\nmanual memory management, where the allocator has to deal with a\nsequence of malloc and free calls determined by the program.  On\nthe other hand, in a language with garbage collection, the GC gets to\nchoose when memory is freed. By collecting more slowly, we free later,\nusing more memory and less time. Adjusting the GC rate trades space\nand time.\n\nSo, in a GC’d language the performance of a program is not described\nby a single (space, time) pair, but by a curve of (space, time) pairs\ndescribing the available tradeoff. The way to make this tradeoff in\nOCaml is to adjust the space_overhead parameter from its default\nof 80. We ran the same benchmark with space_overhead varying from 20\nto 320 (that is, from 1/4 to 4x its default value), giving us a more\ncomplete space/time curve for this benchmark. The benchmark is\nrelatively noisy, but we can still see a separation between best-fit\nand next-fit:\n\n\n\nHere, best-fit handily beats next fit, whether optimising for time or\nspace. Note that for every blue point there is an orange point below\nand left of it, likely with a different space_overhead value. (Also\nnote that these numbers come from one of the benchmarks that best-fit\nperformed the worst on.)\n\nIn the default configuration, best-fit picks a point on the curve\nthat’s a bit further to the right than next-fit: it’s optimising more\naggressively for space use than time spent. In hindsight, this is to\nbe expected: internally, the space_overhead measure does not take\ninto account fragmentation, so for a given space_overhead value\nbest-fit will use less space than next-fit, as it fragments less.\n\nThat’s almost the whole story. There are just two questions left:\nwhat exactly do mean mean by “memory use” and where did the curves\ncome from?\n\nMeasuring memory use\n\nThe y axis above is marked “memory use”. There are suprisingly many\nways to measure memory use, and picking the wrong one can be\nmisleading. The most obvious candidate is OCaml’s top_heap_size,\navailable from Gc.stat. This can mislead for two\nreasons:\n\n\n  \n    It’s quantized: OCaml grows the heap in large chunks, so a minor\nimprovement in memory use (e.g. 5-10%) often won’t affect\ntop_heap_size at all.\n  \n  \n    It’s an overestimate: Often, not all of the\nheap is used. The degree to which this occurs depends on the\nallocator.\n  \n\n\nInstead, the memory axis above shows Linux’s measurements of RSS (this\nis printed by /usr/bin/time, and is one of the columns in top). RSS is\n“resident set size”, the amount of actual physical RAM in use by the\nprogram. This is generally less than the amount allocated: Linux waits\nuntil memory is used before allocating RAM to it, so that the RAM can\nbe used more usefully (e.g. as disk cache) in the meantime. (This\nbehaviour is not the same thing as VM overcommit: Linux allocates RAM\nlazily regardless of the overcommit setting. If overcommit is disabled, it\nwill only allow allocations if there’s enough RAM+swap to handle all\nof them being used simultaneously, but even in this case it will\npopulate them lazily, preferring to use RAM as cache in the\nmeantime).\n\nThe relationship between top_heap_size and RSS differs between allocators:\n\n\n\nThis graph shows the same benchmark run with different iteration\ncounts. Each datapoint is a separate run of the program, whose memory\nuse is larger with larger iteration counts. The RSS lines are shifted\nvertically to align at the left: without this change, the RSS lines\nare larger than the heap size because they also include binary\nsize. The shifted RSS lines slightly exceed top heap size at the right\nof the graph, since not quite all of the memory allocated is heap\n(this happens on both but is more obvious on next-fit).\n\nNotice that the next-fit allocator often uses all of the memory it\nallocates: when this allocator finds the large empty block at the end\nof the heap, it latches on and allocates from it until it’s empty and\nthe entire allocated heap has been used. Best-fit, by contrast,\nmanages to fit new allocations into holes in the heap, and only draws\nfrom the large empty block when it needs to. This means that the\nmemory use is lower: even when the heap expands, best-fit does not\ncause more RAM to be used until it’s needed. In other words, there is\nan additional space improvement of switching to best-fit that does not\nshow up in measurements of top_heap_size, which is why the first graph\nabove plots memory as measured by RSS.\n\nModelling the major GC\n\nThe curves in the first graph above are derived by fitting a\nthree-parameter model to the runtime and space usage data\npoints. Here’s how that model is derived, and roughly what the\nparameters mean.\n\nThe time taken by major collection, under the standard but\nnot-particularly-reasonable assumption that all cycles are the same,\nis (mark_time + sweep_time) * #cycles. Mark time is proportional to\nthe size of live heap (a property of the program itself, independent\nof GC settings like space_overhead), and sweep time is proportional\nto the size of the live heap + G, the amount of garbage collected\nper cycle. This amount G is roughly the amount allocated, so the\nnumber of cycles is roughly the total allocations (another property of\nthe program itself) divided by G.\n\nThe result is that the total runtime is roughly some affine linear\nfunction of (1/G), and the total heap size is roughly G plus a\nconstant. That means that heap size is a function of runtime as\nfollows:\n\nH = 1/(t-a) * b + c\n\n\n\nfor three constants a, b, c. Fitting this 3-parameter model\ngives the curves in the original graph.\n\nThe parameters a, b and c have straightforward\ninterpretations. a is the vertical asymptote, which is the minimum\namount of time the program can take if it does no collection at\nall. This consists of the program code plus the allocator, so best-fit\nimproves a by being faster to allocate. c is the horizontal\nasymptote, the minimum amount of space the program can use if it\ncollects continuously. This consists of the live data plus any space\nlost to fragmentation, so best-fit improves c by fragmenting\nless. Finally, b determines the shape of the curve between the two\nasymptotes. This is broadly similar between the two allocators, since\nchanging the allocator doesn’t strongly affect how fast marking and\nsweeping can be done (although stay tuned here, as there’s some\nwork in progress on speeding up marking and sweeping with\nprefetching).\n\nConclusion\n\nSwitching allocators from next-fit to best-fit has made most programs\nfaster and smaller, but it’s surprising how much work it took to be\nable to say that confidently!\n",
        "url"      : "https://blog.janestreet.com/memory-allocator-showdown/",
        "image"    : "https://blog.janestreet.com/memory-allocator-showdown/MemoryAllocator.jpg",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Announcing Signals and Threads, a new podcast from Jane Street",
        "date"     : "August 31, 2020",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 2,
        "content"  : "I’m excited (and slightly terrified) to announce that Jane Street is\nreleasing a new podcast, called Signals and\nThreads, and I’m going to be the\nhost.\n\nThe idea for the podcast came up as we were discussing what would\nbecome of our public tech talk\nseries and our on-campus\ntalks in an era where in-person talks are not so easy to arrange.  It\noccurred to us that a podcast might be a fun alternative.\n\nAnd it has been fun! More fun than I expected, really.  The structure\nof the podcast is simple enough: each episode is a conversation\nbetween me and a different one of Jane Street’s engineers, diving deep\ninto that person’s expertise.  The topics are varied; language design,\nclock synchronization, the role of IP multicast in markets, the ups\nand downs of reconfigurable hardware, and so on.\n\nOur initial thought was that this would be a good way of communicating\nwith potential recruits (you do know a big part of why we do all these\ntech talks is to hire great\npeople, right?), but as we\ngot into the process, we came to realize that these were conversations\nthat would interest the people here at Jane Street as well, and we’re\nhoping this ends up being useful for both internal and external\naudiences.\n\nIn any case, we’ve been working hard on it, and we hope you enjoy the\nresults.\n\nThe trailer should be out today, September 2nd.  We’ll release one\nepisode each week, on Wednesday morning.  The first episode will be an\ninterview with the inimitable Andy\nRay who leads our hardware\nengineering team, and is the original author of\nHardcaml, an OCaml library\nfor designing hardware.\n\nYou can listen and subscribe to Signals and Threads using the links\nbelow, or you can listen directly from the podcast’s own\nwebsite.\n\n\n\t\n\n\t\n\t\t\n\t\n\t\n\t\t\n\t\n\t\n\t\t\n\t\n\n",
        "url"      : "https://blog.janestreet.com/announcing-signals-and-threads-index/",
        "image"    : "https://blog.janestreet.com/announcing-signals-and-threads-index/signals-and-threads.png",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "What the interns have wrought, 2020 edition",
        "date"     : "August 17, 2020",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : ["internship"],
        "minsToRead" : 12,
        "content"  : "It’s been an unusual internship season.\n\nLike many companies, Jane Street is operating in distributed mode, and\nthat goes for the internship as well.  That meant doing a bunch of\nthings differently, including rethinking how we got the interns up to\nspeed and assigned them to projects and teams.\n\nOne change we made was to how interns are assigned to projects.  In an\nordinary year, interns do two separate projects in totally different\nparts of the tech organization: a single intern might work on an LDAP\nimplementation\nfor the first half of the summer, and on tools for caching snapshots\nfrom exchange-sourced\nmarketdata\nfor the second half.\n\nWhile we think that kind of diverse experience has its upsides, we\ndidn’t feel like we could do it justice this year, mostly because of\ntime constraints. So this year we assigned each intern to a single\nteam for the whole summer.\n\nThat meant that there were fewer projects per intern, but there are\nstill way too many projects to discuss all of them!  So, as usual, I\npicked a few to go into more detail on:\n\n\n  Henry Nelson’s project to build a\nWireshark plugin for Async-RPC, a\ncommon internal messaging format. Along the way, Henry built a\ngeneral purpose library for building Wireshark plugins in OCaml.\n  Yulan Zhang’s project to build an application to automatically shard\ntrading systems to better balance the resources they needed and\nthereby improve performance.\n  Eric Martin’s project to help migrate us off an old-and-deprecated\nregular expression library, leveraging some fancy testing.\n\n\nNow let’s talk about each project in a bit more detail.\n\nAn Async-RPC dissector for Wireshark\n\n\n\nWireshark is an interactive tool for inspecting and viewing network\ntraffic dumps, and it’s an incredibly useful debugging and analysis\ntool. One of its lovely features is that it understands over 2,500\nnetwork protocols out of the box.\n\nOf course, that doesn’t mean it understands our internal-only\nprotocols.  But that’s OK!  Wireshark has a plugin\ninterface\nfor adding support for arbitrary network protocols.  Henry’s project\nthis summer was to write a plugin for\nAsync-RPC,\nwhich is an internally developed protocol that we use all over the\nplace.\n\nIn order to support Async-RPC, there are a bunch of different bits and\npieces you need to handle.  First of all you need to write code that\nknows how to parse Async-RPC’s core Heartbeat, Query, and\nResponse messages.  You also need to be able to deal with messages\nthat are broken up over multiple packets, and even do data-dependent\npacket reassembly.  And you need to handle decryption, since we use\nKerberos for some of our message flows.\n\nAll of this could be done by just writing more-or-less directly\nagainst Wireshark’s standard imperative C API.  But that didn’t seem\nlike a great idea, since that API is tricky and hard to reason about.\nAlso, by writing directly against the C API, you end up with a parser\nthat you can’t reasonably test without invoking all of the Wireshark\nfunctionality.\n\nInstead, Henry wrote an interface that lets you write your packet\nparsing logic in a way that’s abstracted from the concrete details of\ninteracting with Wireshark, and wraps up the potentially multi-stage\nparsing process in what’s called a monadic interface.  The details of\nmonads aren’t important, but the key thing is that we get to use the\nspecial let%bind syntax to mark where we’re giving control back to\nWireshark to go grab more information.\n\nHere’s an example from the Async-RPC dissector of how this API works.\n\n(* Parses a Message.t from lib/async_rpc_kernel/src/protocol.ml *)\nlet parse_message query_ids tree fields =\n  let open Parse.Let_syntax in\n  let open Fields.Rpc in\n  let%bind message_type, subtree = Wireshark_tree.add_item tree fields.message_type in\n  match message_type with\n    | Heartbeat -&gt; return query_ids\n    | Query -&gt; parse_query query_ids subtree fields\n    | Response -&gt; parse_response query_ids subtree fields\n\n\n\nThe Wireshark_tree.add_item call has the effect of both adding a\nmessage type UI element to the Wireshark GUI and returning the message\ntype so that it can be matched on to determine how to proceed with\nparsing. fields.message_type is a special field type that contains\nthe brains for parsing and displaying the protocol message type.\n\nPacket reassembly is super easy to use. You call\nParse.reassemble_packets_until to tell Wireshark how many bytes of\ndata you are expecting should be remaining in this logical message.\n\nlet parse_priv_encrypted_bigstring ~parse_length_header ~session_key =\n  let open Parse.Let_syntax in\n  let%bind length = parse_length_header () in\n  let%bind () = Parse.reassemble_packets_until ~length in\n  let%bind data = Parse.parse_out (Field.Reader.of_length ~length) in\n  Decrypt.decrypt_krb_priv ~session_key data |&gt; Parse.of_or_error\n\n\n\nIf the data is truncated because the packet has been split, the\nlibrary will handle reassembling packets from that connection until\nthat much data is available for you and then it will call back into\nyour parsing code as if the data were there all along.\n\nThis code also supports seamlessly decrypting and displaying\nKerberized RPC packet dumps.  The plugin will connect to our internal\nauthorization database and fetch the necessary data for decryption,\nprovided the user in question has the right permissions.\n\nWe’re excited about this project both because it gives us an immediate\npractical tool in the form of Async-RPC support for Wireshark, but\nalso because it gives us a powerful library that makes it simple and\neasy to build new dissectors for new protocols!\n\nBetter sharding through simulated annealing\n\nA lot our trading systems are structured in a pretty similar way: each\ninstance of the trading system is responsible for some number of\nproducts, and each of those products implies a bunch of data that you\nneed to consume in order to price and trade that product.\n\nETFs are an easy-to-understand example of this.  An ETF (short for\nExchange Traded Fund) is essentially a company whose purpose is to\nhold a basket of shares of other companies.  So, if you buy a share of\nSPY, you’re effectively\nbuying a small slice of every company in the S&P 500 index.\n\nIn order to price an ETF, you want access to the marketdata of the\nconstituents of that ETF; so, in the case of SPY, you’d want to have\naccess to the price of the 500 constituents of the S&P 500 index.\n\nBut, each constituent that you want data for demands some resources\nfrom the instance consuming it, and therein lies an optimization\nproblem.  How do you decide how to spread out ETFs across a collection\nof instances in such a way as to avoid over-taxing any individual\ninstance?  It’s not as easy as it might seem, because the right choice\ndepends not just on the total amount of data you need, but which\nstocks you need data for, since it’s more efficient to put two ETFs\nthat share many of the same constituents on the same instance.\n\nOur baseline approach to this had been pretty primitive.  We used a\nbunch of embarrassingly manual heuristics and effectively did the\nsharding by hand.  That’s bad for a few reasons: it takes a bunch of\ntime, the by-hand sharding is likely not optimal, and that sharding is\nnot going to get updated as the world changes.  How busy you should\nexpect a given security to be changes over time, and the composition\nof ETFs changes over time as well.  If you don’t update your splits\nfrom time to time, you’re going to end up leaving performance wins on\nthe ground.  And doing the whole thing manually doesn’t incentivize\nyou to do it often.\n\nYulan ended up working with both the trading desk that was running\nthis process, as well as the research group, which pointed her at some\ncleaner cost functions to optimize, as well as encouraging her to try\nout simulated\nannealing in\naddition to the greedy algorithm she started with.\n\nAnd the results look really promising!  We now have a solver that you\ncan tell what ETFs to shard and some related metadata, and a few\nminutes later it spits out a sharding that can be used to drive the\nconfiguration of the trading systems.  The results look maybe 10%-20%\nmore efficient than the previous by-hand sharding, and, even better,\nthis saves the desk a bunch of frustrating manual work.\n\nReplacing Re2 with Re\n\nAn important part of managing technical debt is\nmigrations, i.e., organized\nprojects for removing deprecated code and patterns.  Eric Martin’s\nproject was part of just such a migration, in particular, migrating\nfrom one regular expression library, Google’s Re2\nlibrary, to another one, a pure OCaml\nlibrary called Re.  (As an\namusing side-note, our wrapper of Re2 was also an intern project, many\nyears ago!)\n\nWe’ve wanted to get rid of Re2 for a while, but it’s tricky.\nReplacing it is painful in part because the semantics of Re2 regular\nexpressions don’t quite line up with the semantics of regular\nexpressions in Re.  Eric’s project was to create a new library,\nRe_replace_re2, which is meant to be a drop-in replacement for our\nRe2 wrappers, which we could automatically smash into place across\nthe tree.\n\nHow hard could it be?  Well, the answer, it turns out, is pretty hard.\nThe first task was to get a parser which could be used to produce an\nan abstract syntax tree (AST) representing the structure of the\nregular expression.\n\nInstead of writing a parser in OCaml, Eric’s instead hooked into the\nexisting Re2 parser, and used that to produce an OCaml data\nstructure representing the AST.  This has a few advantages:\n\n\n  It’s easier than reverse engineering Re2 to write a parser from\nscratch.\n  Even though it doesn’t entirely drop our dependency on Re2, it\ndoes reduce the amount of code we depend on.\n  Down the line, we hope to use it to mechanically convert Re2-style\npatterns to patterns that Re already knows how to parse, letting\nus further reduce our reliance on Re2.\n  Finally, assuming we do eventually need to write a pure OCaml\nparser, having a nice wrapped-up version of the C++ parser makes\ntesting it easier.\n\n\nBut parsing isn’t the only job that needs to be done.  Once you have\nthe AST to represent the Re2 regexp, you still need some way to encode\nit in Re.  And that’s tricky, because, despite them both being\n“regular expression” engines, they don’t quite implement classical\nregular expressions; they always have a few, often ill-specified,\nextensions up their sleeve, which makes the translation a bit more\nfraught.\n\nSo a big part of the story of getting this library right was testing!\nEric deployed\nQuickcheck for\ndoing some\nbisimulation-style\ntesting to check that Re_replaces_re2 behaves the same way as Re2\nproper.  (Quickcheck is a neat idiom for building probability\ndistributions for generating examples for tests, and bisimulation is a\ntechnique where you essentially run two implementations of the same\nprogram together and check the results against each other.)\n\nEric also used a seemingly magical tool called\nAFL, which is a fuzzer that is\nshockingly good at generating test cases to exercise specially\ninstrumented programs.  He used this specifically for finding bugs in\nhis C++ parsing code.\n\nAnyway, the end result of all this good work is that we have a version\nof Re_replace_re2 that’s nearly ready to be used, at least to remove\na large fraction of the uses of Re2.  There are still some semantic\ngaps (some of which depend on fixes to Re itself that Eric\nsubmitted upstream!),\nwhich means it’s not quite ready to replace Re2 everywhere.  All of\nwhich goes to show, migrations are hard work.\n\nSound like fun? Then apply!\n\nI hope this gives you a feel for the kind of work interns get to do at\nJane Street.  We aim to give interns projects that both teach them\ninteresting things, and also have real impact on the organization.  I\nthink we did pretty well on that this summer.\n\nIf this sounds like fun to you, you should\napply.  And\nthis is a good\nplace to start if you want to learn more about the interview process.\n\n(Links to\nReddit\nand HN, if you want\nto comment on the post.)\n",
        "url"      : "https://blog.janestreet.com/what-the-interns-have-wrought-2020/",
        "image"    : "https://blog.janestreet.com/what-the-interns-have-wrought-2020/distributed-wrought.jpg",
        "topic"    :  ["technology","internship"] ,
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "The Jane Street Interview Process &mdash; 2020 Edition",
        "date"     : "July 24, 2020",
        "authorId" : "sdefabbia-kane",
        "author"   : "Sam DeFabbia-Kane",
        "tags"     : ["interviewing"],
        "minsToRead" : 4,
        "content"  : "We’re busy preparing for our software engineering fall hiring\nseason. Over the years we’ve\ndone our best to make our interview process more transparent to\ncandidates. While many candidates show up knowing something about what\nour interviews look like, much of the information floating around on\nthe internet is outdated or wrong. These past few months have also\nchanged a lot about the process as we’ve adapted to working from home\nand other effects of COVID-19.\n\nA few notes about our interview process before we jump into specifics:\n\n\n  \n    This post is specific to our general software engineer intern/new\ngrad/experienced roles. Interviews for other roles in engineering will differ\nat least a bit, and interviews for roles in trading, research, and other\nparts of the business will differ substantially.\n  \n  \n    We’re not promising your experience will look exactly like this. We’re\nwriting this down because we think it’ll be helpful, but we are\nconstantly iterating on our interview process and adapting interviews\nto candidates when it makes sense, so parts of this will certainly\nchange.\n  \n  \n    Hiring is a challenge without easy solutions. This process is\nthe best balance we’ve found between being able to hire excellent\npeople and not taking too much time—yours or ours—but we know it’s\nvery far from perfect.\n  \n\n\nThe process\n\nThe exact set of interviews will vary based on the role you’re\napplying for and past experience, but almost everyone will do:\n\n\n  One technical phone interview\n  An on-site interview with 2-4 technical rounds\n\n\nIn normal times, the on-site interview is held in one of our\noffices. Right now, due to COVID-19, we are doing onsites via video\nconference rather than in person.\n\nTechnical interview format\n\nThe bulk of our technical interviews ask you to work through a coding\nand algorithms problem with 1-2 full-time Jane Street software engineers.\n\nOur goal is to get a sense of how you work and what it’s like to work\nwith you. We understand it’s far from a perfect simulation: interviews\nare stressful, and we’re seeing you outside of your normal work\nenvironment. We expect strong candidates to be able to program in a\nlanguage of their choice, be effective communicators, and work through\nprogramming problems. But we also expect strong candidates to make\nmistakes and miss things.\n\nIf you’d like to read more about our technical interviews, we’ve\nwritten in the past about what we’re looking\nfor in\ncandidates and what our usual dev interviews are\nlike. Both\nposts are still broadly accurate.\n\nInterview logistics\n\nFor phone interviews we use CoderPad, a shared in-browser code\neditor. It has syntax highlighting and autocomplete suggestions for\nmany languages, and keybindings to mimic various editors.\n\nDue to COVID-19, our usual on-site interviews are being held remotely\nas video conferences. We’re currently using a combination of Webex for\nvideo and CoderPad for coding/a whiteboard.\n\nGenerally we’d like to give you a programming environment close to\nwhat you’re used to within the constraints of an interview. CoderPad\nis the best option we’ve found for phone interviews and for remote\nonsites.\n\nVariations\n\nFor more experienced candidates, we may do an additional non-technical\nphone interview to discuss what you’re looking for and to give you a\nchance to ask any questions you have about Jane Street. On site, we\nmay ask you to talk about your past experience, or otherwise do some\ninterviews that aren’t our standard technical questions.\n\nFor London candidates, we ask some people to go through a HackerRank\nround before the technical phone interview because we don’t have the\ncapacity to phone interview everyone we’d like to. We don’t do this in\nNew York or Hong Kong.\n\nNotes on common questions/misconceptions\n\nThere’s a lot of information floating around on the internet about our\ninterview process and what we’re looking for in a successful\ncandidate. Some of it is wrong and some of it is just stale—our\ninterview process has evolved a ton over time. All of the below is\ntrue now, and has been true for years.\n\n\n  \n    We won’t ask you math or probability questions for general software\nengineering roles. Really. We promise.\n  \n  \n    There are no bonus points for using OCaml or another functional\nlanguage in an interview. If you try using OCaml in your interview,\nwe’re likely going to try to dissuade you unless you have actual\nprofessional experience using it. We want to see you at your best,\nwhich means using whatever language you’re most comfortable with.\n  \n  \n    You don’t need to know OCaml or anything about finance. Most\nsoftware engineers we hire come in knowing very little about either\nand we have lots of experience teaching people.\n  \n  \n    You don’t need a Ph.D. or a master’s degree to apply. In fact, the\nvast majority of software engineers we hire don’t have advanced\ndegrees. (Although we’re certainly not going to disqualify you for\nhaving one, either.)\n  \n\n\nWe hope this helps to demystify Jane Street’s software engineering\nevaluation process, especially at a time when the world is moving and\nchanging so much. Our teams are growing and we expect to continue\nhiring.\n\nIf you’re interested in working with us: submit an application on our\nwebsite!\n",
        "url"      : "https://blog.janestreet.com/jane-street-interview-process-2020/",
        "image"    : "https://blog.janestreet.com/jane-street-interview-process-2020/ocaml_code.png",
        "topic"    :  ["technology","interviewing"] ,
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Really low latency multipliers and cryptographic puzzles",
        "date"     : "June 22, 2020",
        "authorId" : "bdevlin",
        "author"   : "Benjamin Devlin",
        "tags"     : [],
        "minsToRead" : 13,
        "content"  : "At Jane Street, we have some experience using FPGAs for low-latency\nsystems–FPGAs are programmable hardware where you get the speed of an\napplication-specific integrated circuit (ASIC) but without being\ncommitted to a design that’s burned into the chip. It wasn’t so long\nago that FPGAs were expensive and rare, but these days, you can rent a\n$5,000 card on the Amazon AWS cloud for less than $3 an hour.\n\nRecently, my entry in a competition for a decades-old puzzle showed\nhow clever use of FPGAs can be used to push the boundaries of\nlow-latency computing.\n\nCryptographic puzzles\n\nBack in 1999, MIT’s Computer Science and Artificial Intelligence Lab\ncreated a time capsule that included a cryptographic puzzle designed\nby Ron Rivest (the “R” in RSA). The puzzle was to calculate\n22t (mod n) for t =\n79,685,186,856,218 and a 2048-bit semiprime modulus n. (A semiprime\nis the result of multiplying two primes together.) The\nprompt\nhelpfully pointed out that the problem could be solved by starting\nfrom 2 and repeatedly squaring t times mod n. For example (from\nthe prompt):\n\nSuppose n = 11*23 = 253, and t = 10.  Then we can compute:\n\t2^(2^1) = 2^2 = 4 (mod 253)\n\t2^(2^2) = 4^2 = 16 (mod 253)\n\t2^(2^3) = 16^2 = 3 (mod 253)\n\t2^(2^4) = 3^2 = 9 (mod 253)\n\t2^(2^5) = 9^2 = 81 (mod 253)\n\t2^(2^6) = 81^2 = 236 (mod 253)\n\t2^(2^7) = 236^2 = 36 (mod 253)\n\t2^(2^8) = 36^2 = 31 (mod 253)\n\t2^(2^9) = 31^2 = 202 (mod 253)\n\tw = 2^(2^t) = 2^(2^10) = 202^2 = 71 (mod 253)\n\n\n\nRivest’s team chose the number of squarings t so that, if Moore’s\nLaw held up, the puzzle would take around 35 years to crack.\n\n\n  We can expect internal chip speeds to increase by a factor of\napproximately 13 overall up to 2012, when the clock rates reach\nabout 10GHz. After that improvements seem more difficult, but we\nestimate that another factor of five might be achievable by 2034.\nThus, the overall rate of computation should go through\napproximately six doublings by 2034.\n\n\nBut then in 2019 it was announced that Belgian programmer Bernard\nFabrot\nhad been able to crack the puzzle in just three and a half years, a\ndecade and a half ahead of schedule. There were no magic tricks in his\napproach. It was just that Rivest’s original estimate was off by a\nfactor of ten. While we don’t have 10GHz CPUs sitting in our desktops\n(mainly due to thermal issues), CPU and multi-core architecture has\nadvanced dramatically. A few weeks after Bernard announced that he\nsolved the puzzle, another group called Cryptophage announced they\nhad, too, using FPGAs in just two months.\n\nAn interesting aspect of this puzzle is that while it’s expensive to\ncompute, it’s cheap for the designer of the puzzle to verify the\nsolution. That’s because if you know the two primes p and q that\nare the factors of n, you can use Euler’s totient function to\ncalculate phi(n) = (p-1)(q-1). Once you have that, the large\nexponent can be reduced from 22t (mod\nn) to a much faster to calculate 22t (mod\nphi(n)) (mod n).\n\nThese types of cryptographic puzzles are part of a class of Verifiable\nDelay Functions (VDF): problems that take some medium to large\nquantity of non-parallelizable work to compute, but that can be\nverified quickly. They are useful in decentralized blockchain systems,\nfor instance for randomness beacons, voting, and proofs of\nreplication. While Rivest’s puzzle required secret knowledge to\nquickly verify the result, there are many proposed\nconstructions that allow a VDF\nto be publicly verified without secret knowledge.\n\nUsing FPGAs for low latency\n\nIn late 2019, the VDF alliance began a\ncompetition to find the lowest latency achievable for a VDF problem\nsimilar to Ron Rivest’s 1999 puzzle. The idea was that by submitting\nsuch problems to fierce competition, you could help battle-test\nsystems that rely on VDFs.\n\nThe competition required participants to solve a scaled-down version\nof the Rivest puzzle, with a t of ~233 instead of\n246, and a 1024-bit modulus. Contestants were also given\np in advance.\n\nOzturk multiplier\n\nThe winner from the first round of the VDF alliance competition was\nEric\nPearson,\nusing an Ozturk multiplier\narchitecture. This type of\nmultiplier takes advantage of a couple of tricks that FPGAs can do\nextremely efficiently that GPUs or CPUs can’t:\n\n\n  \n    Redundant bit representation (RBR). This means your 1024-bit\nnumber is split into n equally-sized words (in this case n = 64\nwords, each 16 bits), but then each word gets an extra redundant\nbit. The advantage of this is when you accumulate all the partial\nproducts from a multiplication, you don’t need to propagate carry\nbits through the whole 2048-bit result–you only need to propagate\nit to the neighboring word. On an FPGA, the maximum speed your\ncircuit can operate will be limited by the slowest path–which is\noften the carry chain. This helps makes the squaring part of the\nequation run as fast as possible. When we square, we use the same\nalgorithm as multiplication, except half of the partial products\nare identical. We don’t need to calculate them twice: we just\nbit shift them by 1 bit, essentially doubling them for free.\n  \n  \n    Modulo reduction using lookup tables. Every bit that is set\npast the 1024-bit modulo boundary can be precalculated as a modulo\np value and added back onto the original result–e.g.,\n22000 becomes 22000 % p. This way, a\n2048-bit result can be reduced back to a 1024 +\nlog2(height of the pre-computed word tree) bit modulo\np value. This takes a lot of memory on the FPGA, but allows you\nto calculate modulo p in just three clock cycles: two to look up\nRAM, one to fold the offsets back into the result. Both this\ntechnique and using RBR help speed up the final “modulo p” step\nrequired by the VDF equation.\n  \n\n\n\nOzturk multiplier\nimplemented on a FPGA\n \n\nEric’s implementation took advantage of the LUT size in this\ngeneration of FPGA (6 input) to more efficiently map the memory\nreduction elements and compression trees in the multiplier partial\nproduct accumulation. For the modulo reduction, instead of block RAMs\nhe used the faster LUTRAM, which further saves a clock cycle; taking\nthis saved clock cycle to add more pipelining to paths where timing\nwas critical allowed for an operating frequency of 158.6MHz (total\n4-stage pipeline). This meant a single iteration of the VDF could be\nperformed in 25.2ns. The design spanned most of the FPGA, crossed\nmultiple super logic regions (abbreviated SLR, these are individual\nsilicon dies that are used to form one large chip), and required power\nmanagement to run. Eric commented in his submission that the FPGA\nactually burns as much as 48W when running in the AWS cloud.\n\nPredictive Montgomery multiplier\n\nThere was a third and final round of the competition where more\nalternative approaches to the problem were encouraged–the Ozturk\nmultiplier code was actually supplied in a basic form for the first\nrounds, so it was important to make sure there wasn’t some alternative\ndesign that might be faster than Ozturk. (Initially, no one wanted to\nspend the time trying to implement something from scratch only to find\nout it was not as good.) This sounded interesting to me so I came up\nwith a novel “Predictive Montgomery multiplier” architecture which was\nthe final round\nwinner.\n\nTricks I used (again not efficiently possible on GPUs or CPUs):\n\n\n  \n    Montgomery multiplication with RBR. I decided to implement the\nmodulo reduction scheme using Montgomery’s\nalgorithm,\nwhich requires a transform into a Montgomery domain–but then\nmodulo p becomes a fixed bit-shift which can be done on a FPGA in\nO(1) time.  I only transform in and out of the Montgomery domain\nat the start and end of our 2n loops, so the overhead is\nnot noticeable. I also modified the algorithm to work with\nRBR–this makes each individual step a little bit faster.\n\n    Montgomery multiplication only involves multiplications, additions,\nand bit shifts–all of which are easy to implement in RBR, and will\nbenefit from the shorter carry-chain. I also add an extra RBR word\nso that in total there are 65 words each of 17 bits to represent a\n1024-bit number. This change allows a modification to the\nMontgomery algorithm so that it is guaranteed to produce a result\nless than p (a traditional Montgomery algorithm only brings the\nresult to less than 2p). By using Montgomery multiplication I was\nable to save on a lot of FPGA area because the modulo step of the\nVDF equation becomes much easier.\n  \n  \n    Log-3 compressor circuit with fast-carry. I decided to\nimplement my multiplier as a simple cross-product multiplier using\nFPGA DSPs (digital signal processors, a small dedicated resource on\nthe FPGA for performing multiplication efficiently). The chosen RBR\nbit width of 17 means one partial product only takes 1 DSP\nresource, followed by a final partial product accumulation stage\nusing general FPGA LUTs. The maximum height of the tree used\nrequires 128 columns, each containing 129 partial products, each 17\nbits wide to be added together. I experimented with different log\nvalues and found folding the tree with arity 3, and using the FPGA\nadder fast-carry (rather than using compressor trees which did not\nutilize the fast-carry) gave the best results. This style of\ncompressor implemented with RBR allowed me to squeeze even more\nperformance out of the multiplier than before.\n  \n  \n    Predictive branching. Montgomery multiplication requires a full\n1024-bit squaring, followed by a 1024-bit multiplication where I\nonly care about the lower 1024 bits of the result, followed by a\n1024-bit multiplication, addition, and bit shift down (so I\nessentially only care about the upper 1024 bits of the result).\n\n    The problem here is that a single SLR on the FPGA only has 2280\nDSPs–and I really wanted to work in this budget, since\ncommunicating with multiple SLRs can make the whole design slower.\nA single squaring takes around 2120 DSPs, but a full multiplication\nwill take 4225. To solve this problem I use predictive branching:\nthe multiplier calculates partial products based on an inputs fed\nin via a crossbar, where the inputs are selected so I’m only\ncalculating a full square, the lower 65 + x words, or the upper\n65 + x words. Here x is the number of words past the boundary I\ncalculate, to make sure I account for any carry overflow that might\nbe getting erroneously included or discarded due to our RBR form.\n\n    If I detect in the boundary words that might have this case\n(detected by all 1s and no free bits to absorb carry), I will\nbranch and calculate the full 2048-bit (130-word) result, with the\ncarry fully propagated. This is so rare that it hardly impacts\nperformance, but this style of predictive branching allows us to\nimplement the entire Montgomery RBR algorithm using a single SLR\nand 2272 DSPs. I also take advantage of the partial product tree\nand shortcut the extra addition in there without adding extra\npipeline stages.\n  \n  \n    Slower, shorter pipeline. Often, pipeline stages can be added\nto allow a design to achieve a higher frequency and higher\nthroughput. But since this is for low latency, you actually don’t\nwant to pipeline more than necessary–extra pipelines will increase\nthe routing latency on signals in the FPGA as they now need to make\n“pit stops” to access the register. In my design, the main data\npath loop only has a single pipeline stage on the output of the\npredictive multiplier, which directly feeds back into the\nmultiplier crossbar. This improvement not only helps improve\nlatency, but also reduce the overall power used in the design. A\nslower clock means individual signals will be switching at a lower\nfrequency, which leads to a quadratic reduction in power consumed\non the FPGA.\n  \n\n\n\nMontgomery predictive multiplier implemented on a FPGA\n \n\nMy multiplier design ran at 65MHz and took 46ns (3 clock cycles) to\nfinish one iteration of the VDF. It could be fully implemented on one\nSLR of the FPGA (a third of the FPGA, 2.3x less than the Ozturk\ndesign) and didn’t require any special power management as it consumed\n3.7x less power (both designs were simulated in Vivado, FPGA design\nsoftware). Because my design was smaller, it does lend itself to be\nscaled easier. If this design was scaled up (by using all 3 SLRs on\nthe FPGA) to solve the original 35-year puzzle from Ron Rivest it\nwould of taken us a little over two months!\n\nThe table below shows a comparison of the two 1024-bit architectures,\nwith the ratio of improvement (so higher is better) shown in brackets\nwhere the Ozturk multiplier is the base 1.0x. My design had the lowest\nlatency and won the round I was competing in, as well as a better\npower (in the table we show energy consumed as Joules per operation)\nand area efficiency compared to the Ozturk multiplier architecture.\nBut overall Eric’s round 2 design was able to achieve a lower absolute\nlatency.\n\n\n  \n    \n      Architecture\n      Area (FPGA KLUTs)\n      Latency (ns) / op\n      Power (W)\n      Joule (nJ) / op\n    \n  \n  \n    \n      Ozturk\n      464\n      25.2\n      18.3\n      461\n    \n    \n      Predictive Montgomery\n      201 (2.3x)\n      46 (0.55x)\n      4.9 (3.7x)\n      224 (2.1x)\n    \n  \n\n\nConclusion\n\nAll of this doesn’t connect directly to the work we do at Jane Street.\nIn particular, we’re not likely to use VDFs in our infrastructure\nanytime soon. But the broad approach of using reconfigurable hardware\nto build solutions that can be orders of magnitude faster than what\ncould be done in an ordinary CPU is at the core of what our group\ndoes. And, as this example highlights, building the most efficient\nsolution requires you to think about completely new architectures,\nresource constraints, and capabilities than what you would ordinarily\nconsider in software.\n\nNotes\n\n\n  There are other modulo reduction algorithms–for example Barret’s\nreduction and the Chinese remainder theorem, as well as other\narchitectures that can be used for the actual underlying\nmultiplier, such as Toom-Cook, Booth, and Karatsuba. I investigated\nall these approaches but found for various reasons that they didn’t\nmap to this problem on a FPGA as well (i.e., Barret’s algorithm\nrequired subtractions which would make RBR more complicated and\nslower).\n\n",
        "url"      : "https://blog.janestreet.com/really-low-latency-multipliers-and-cryptographic-puzzles/",
        "image"    : "https://blog.janestreet.com/really-low-latency-multipliers-and-cryptographic-puzzles/lock.png",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Using ASCII waveforms to test hardware designs",
        "date"     : "June 1, 2020",
        "authorId" : "aray",
        "author"   : "Andrew Ray",
        "tags"     : [],
        "minsToRead" : 10,
        "content"  : "At Jane Street, an “expect\ntest” is a\ntest where you don’t manually write the output you’d like to check\nyour code against – instead, this output is captured automatically\nand inserted by a tool into the testing code itself. If further runs\nproduce different output, the test fails, and you’re presented with\nthe diff.\n\nFor example, we might write:\n\nlet%expect_test \"foo\" = printf \"Mean: %F\" (mean 1. 4.)\n\n\n\nWhen compiled we will get a corrected file with the following output:\n\nlet%expect_test \"foo\" =\n  printf \"Mean: %F\" (mean 1. 4.)\n  [%expect {| 2.5 |}]\n\n\n\nSince this is correct, we accept the output and this becomes the\ncontents of our test file. Subsequent runs of the test produce the\nsame output, so there are no further diffs. If, however, we change the\ndefinition of mean to, say, the geometric mean, our tool generates a\nbuild error and a diff:\n\nlet%expect_test \"foo\" =\n  printf \"Mean: %F\" (mean 1. 4.)\n-|  [%expect {| 2.5 |}]\n+|  [%expect {| 2.0 |}]\n\n\n\nThe most interesting aspect of this style is the workflow: we write\nthe initial code, our build system computes the correct output, and\nour editor integration means it takes just a couple of keypresses to\naccept the result if we’re happy with it.  Then we check it into our\nsource code repository, at which point continuous integration systems\nautomatically ensure our tests continue to work in the presence of\nevery developer’s bugbear – other developers.\n\nExpecting things from hardware\n\nWhen we develop a hardware design we always do so in the presense of a\ntestbench.  A testbench provides stimulus for the design, monitors the\ncomputed outputs and ensures that they are correct.\n\nAt Jane Street, developing and testing hardware designs is performed\nusing an Embedded Domain Specific Language called\nHardcaml, which\nunsurprisingly is written in OCaml.\n\nOne of the key features of Hardcaml is that it provides a\ncycle-accurate simulator. This allows us to develop both the hardware\ndesign and testbench in OCaml.\n\nThe following is a simple 8-bit counter. It resets back to 0 when the\nclear signal is high, and counts up when the incr signal is high.\nOtherwise it holds its previous value.\n\nopen Hardcaml\nopen Hardcaml.Signal\nopen Hardcaml_waveterm\n\nmodule I = struct\n    type 'a t =\n      { clock : 'a\n      ; clear : 'a\n      ; incr : 'a\n      }\n      [@@deriving sexp_of, hardcaml]\nend\n\nmodule O = struct\n    type 'a t =\n      { dout : 'a[@bits 8]\n      }\n      [@@deriving sexp_of, hardcaml]\nend\n\nlet create (i : _ I.t) =\n  { O.dout =\n      reg_fb\n        (Reg_spec.create ~clock:i.clock ~clear:i.clear ())\n        ~enable:i.incr\n        ~w:8\n        (fun d -&gt; d +:. 1)\n  }\n;;\nval create : t I.t -&gt; t O.t = &lt;fun&gt;\n\n\n\nThe following is a simple testbench for the counter which shows its\nbehaviour for different values of clear and incr.\n\nmodule Simulator = Cyclesim.With_interface(I)(O)\n\nlet testbench () =\n  let sim = Simulator.create create in\n  let inputs = Cyclesim.inputs sim in\n  let outputs = Cyclesim.outputs sim in\n  let step ~clear ~incr =\n    inputs.clear := if clear=1 then Bits.vdd else Bits.gnd;\n    inputs.incr := if incr=1 then Bits.vdd else Bits.gnd;\n    Printf.printf \"clear=%i incr=%i dout=%i\\n\"\n      clear incr (Bits.to_int !(outputs.dout));\n    Cyclesim.cycle sim\n  in\n  step ~clear:0 ~incr:0;\n  step ~clear:0 ~incr:1;\n  step ~clear:0 ~incr:1;\n  step ~clear:1 ~incr:0;\n  step ~clear:0 ~incr:0;\n  step ~clear:0 ~incr:0\n;;\nval testbench : unit -&gt; unit = &lt;fun&gt;\n\ntestbench ();;\nclear=0 incr=0 dout=0\nclear=0 incr=1 dout=0\nclear=0 incr=1 dout=1\nclear=1 incr=0 dout=2\nclear=0 incr=0 dout=0\nclear=0 incr=0 dout=0\n- : unit = ()\n\n\n\nWe can now capture this behaviour as an expect test.\n\nlet%expect_test \"counter\" =\n  testbench ();\n  [%expect {|\n    clear=0 incr=0 dout=0\n    clear=0 incr=1 dout=0\n    clear=0 incr=1 dout=1\n    clear=1 incr=0 dout=2\n    clear=0 incr=0 dout=0\n    clear=0 incr=0 dout=0\n  |}]\n;;\n\n\n\nWaveform expect tests\n\nDigital waveforms are commonly used during hardware development to\ncapture the time-varying behaviour of signals relative to one another.\nUsing the\nHardcaml_waveterm\nlibrary we can print waveforms from Hardcaml simulations.\n\nlet testbench () =\n  let sim = Simulator.create create in\n  let waves, sim = Waveform.create sim in\n  let inputs = Cyclesim.inputs sim in\n  let step ~clear ~incr =\n    inputs.clear := if clear=1 then Bits.vdd else Bits.gnd;\n    inputs.incr := if incr=1 then Bits.vdd else Bits.gnd;\n    Cyclesim.cycle sim\n  in\n  step ~clear:0 ~incr:0;\n  step ~clear:0 ~incr:1;\n  step ~clear:0 ~incr:1;\n  step ~clear:1 ~incr:0;\n  step ~clear:0 ~incr:0;\n  step ~clear:0 ~incr:0;\n  waves\n;;\nval testbench : unit -&gt; Waveform.t = &lt;fun&gt;\n\nlet waves = testbench ();;\nval waves : Waveform.t = &lt;abstr&gt;\nWaveform.print ~display_height:12 waves;;\n┌Signals────────┐┌Waves──────────────────────────────────────────────┐\n│clock          ││┌───┐   ┌───┐   ┌───┐   ┌───┐   ┌───┐   ┌───┐   ┌──│\n│               ││    └───┘   └───┘   └───┘   └───┘   └───┘   └───┘  │\n│clear          ││                        ┌───────┐                  │\n│               ││────────────────────────┘       └───────────────   │\n│incr           ││        ┌───────────────┐                          │\n│               ││────────┘               └───────────────────────   │\n│               ││────────────────┬───────┬───────┬───────────────   │\n│dout           ││ 00             │01     │02     │00                │\n│               ││────────────────┴───────┴───────┴───────────────   │\n│               ││                                                   │\n└───────────────┘└───────────────────────────────────────────────────┘\n- : unit = ()\n\n\n\nSince they are just text, these can also be captured with expect tests:\n\nlet%expect_test \"counter\" =\n  let waves = testbench () in\n  Waveform.print ~display_height:12 waves;\n  [%expect {|\n┌Signals────────┐┌Waves──────────────────────────────────────────────┐\n│clock          ││┌───┐   ┌───┐   ┌───┐   ┌───┐   ┌───┐   ┌───┐   ┌──│\n│               ││    └───┘   └───┘   └───┘   └───┘   └───┘   └───┘  │\n│clear          ││                        ┌───────┐                  │\n│               ││────────────────────────┘       └───────────────   │\n│incr           ││        ┌───────────────┐                          │\n│               ││────────┘               └───────────────────────   │\n│               ││────────────────┬───────┬───────┬───────────────   │\n│dout           ││ 00             │01     │02     │00                │\n│               ││────────────────┴───────┴───────┴───────────────   │\n│               ││                                                   │\n└───────────────┘└───────────────────────────────────────────────────┘\n  |}]\n;;\n\n\n\nExpect test workflow\n\nUsing expect tests in this way makes the hardware development process\na lot more similar to developing software. It allows us to leverage\nyears’ worth of tool development and integrate with the Iron\nworkflow\nthat is central to development at Jane Street.\n\nIt is also simply a lot nicer to write testbenches in OCaml than in\ntraditional RTL languages.\n\nThere are without doubt some drawbacks – embedding a waveform in\nsource code means you can’t reasonably display that much information.\nThis means you have to think carefully about what information to\nexpose (some might argue this is an advantage). Also, while we do have\nan interactive viewer application for waveforms, it can be a bit of\nwork to factor code so that it can work both in the interactive viewer\nand expect tests.\n\nFinally, we have yet to teach our diff tools about waveforms – the\ncurrent diffs are not terrible, but they could be better.\n\nEven with these issues, using expect tests while developing hardware\nhas become very common within our team.  This gives us a good\nincentive to spend some time improving our tooling.\n\nOne final note: while we do all of this using our internal tooling,\nall of the underlying code is available on Github as open source:\nexpect tests\nand\nhardcaml.\nAnd while the external tooling isn’t quite as good as the internal\ntools (yet!), Dune provides pretty good support\nfor working with expect tests.\n",
        "url"      : "https://blog.janestreet.com/using-ascii-waveforms-to-test-hardware-designs/",
        "image"    : "https://blog.janestreet.com/using-ascii-waveforms-to-test-hardware-designs/scientist_testing.jpg",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Chrome extensions: Finding the missing proof",
        "date"     : "April 17, 2020",
        "authorId" : "mbannister",
        "author"   : "Mark R. Bannister",
        "tags"     : [],
        "minsToRead" : 16,
        "content"  : "Web browsers have supported custom\nplug-ins and\nextensions since\nthe 1990s, giving users the ability to add their own features and\ntools for improving workflow or building closer integration with\napplications or databases running on back-end servers.\n\nThe Google Chrome browser supports\nextensions that add to its\nfunctionality and which are typically hosted on the Chrome Web\nStore, but\nit is often desirable for firms to develop and host their own\nextensions internally.\n\nAccording to Google’s\nAlternative\nExtension Distribution\nOptions,\nChrome extensions that are developed and hosted on a firm’s internal\nwebsite are known as external extensions.  This is slightly\nconfusing at first, but external refers to the extension being\nexternal to the Chrome Web Store, not being external to the company\nthat developed it.\n\nWe wanted to host our own Chrome extensions on an internal web server\nfor web browsers running on the Linux operating system.  However,\ndespite setting up an example\nextension and\nfollowing the Linux\nhosting\nrequirements precisely, we would receive the following error when\nattempting to install the extension in the browser:\n\nPackage is invalid: CRX_REQUIRED_PROOF_MISSING\n\n\n\nThe error was devoid of explanation or reason, leaving little to go\non.  Some research on the web revealed that many people had complained\nabout this error but each example found seemed to be for different\nreasons that did not match our case.  On the road to a solution we\npassed many landmarks, each time expecting either success or at least\na different, more informative error message.  Unfortunately, each\nstep we took revealed no further information, no clue that we had even\nprogressed an inch, like we were trying to guess the secret password\nto enter Aladdin’s cave.  “CRX_REQUIRED_PROOF_MISSING” was the\ncryptic greeting every time.\n\nWe did, eventually, solve the conundrum.  For the benefit of others\nattempting the same feat, this blog post will walk you through how to\ninstall Chrome extensions from an internal web server.  The\ninstructions will have a heavy leaning toward Linux, although some of\nthe lessons learned will apply to other operating systems.\n\nBuild an example extension\n\nFollow the Getting Started\nTutorial to build\nan extension you can test with.  When this extension is built,\ndragging and dropping it into the\nchrome://extensions page will install the\nextension.\n\nThe tutorial walks you through using Chrome’s Load unpacked\nbutton in order to install the extension directly from your\ndevelopment folder.  While there is also a Pack extension button\nthat will create a CRX file that contains your extension, you may\nwonder, as we did, how to create a CRX file from the command-line.\nThe CRX file format changed from CRX2 to CRX3 during 2019, leaving\nmany scripts that you can find while trawling the internet\nbroken.\n\nTo pack an extension from the command line, you can use the browser’s\n--pack-extension option:\n\n$ chrome --pack-extension=&lt;extension directory&gt;\n\n\n\nwhich will generate a new private/public key pair saving a new .crx\nand .pem file in the current directory, or:\n\n$ chrome --pack-extension=&lt;extension directory&gt; \\\n         --pack-extension-key=&lt;extension pem file&gt;\n\n\n\nto use an existing key file.  More details on packaging can be found\nhere.\n\nYou will need to obtain the extension ID and make a note of it.  This\nis the unique identifier that Chrome will use to refer to your\nextension and will be required in some configuration files later on.\nIf you install the extension into Chrome by dragging and dropping,\nthen Chrome will display the extension ID for you.  Otherwise, to do\nthis programmatically using the .pem file, see\nhere.\n\nUnfortunately, Chrome on Linux expects to have an X display for the\n--pack-extension command even though it does not open a window.\nAlso the --headless option does not seem to work with\n--pack-extension.  So if you are trying to get this to work on a\nserver that has no X display, I have found that\nXvfb\nmakes it possible, e.g.\n\n$ Xvfb &\n$ DISPLAY=:0 chrome --pack-extension=&lt;extension_directory&gt;\n$ kill %1\n\n\n\nSetting up a test web server\n\nConfigure for SSL connections\n\nNext you will need a web server with an SSL configuration.  We used\nnginx which was quick to compile, install and\nconfigure.  The web server needs to be configured to listen for SSL\nconnections (usually on port 443).\n\nConfigure MIME types\n\nMake sure that the mime.types file is correctly configured for the\nfollowing file extensions:\n\ntext/xml                                         xml;\napplication/x-chrome-extension                   crx;\n\n\n\nCreate SSL certificates\n\nTo get Chrome to trust SSL connections to the test web server, create\na small certificate chain: a server certificate signed by a test CA\ncertificate that you load into the Chrome browser as a trusted\ncertificate authority.\n\nTo create the CA certificate, start with a ca.conf file like this:\n\n[req]\nprompt = no\ndefault_md = sha256\ndistinguished_name = dn\n\n[dn]\nC = &lt;country_code&gt;\nST = &lt;state&gt;\nL = &lt;locality&gt;\nO = &lt;organisation name&gt;\nCN = My Test Root CA\n\n\n\nWe will use this configuration file in a moment.  You will also need a\nserver.conf file that looks like this:\n\n[req]\nprompt = no\ndefault_md = sha256\nx509_extensions = v3_req\ndistinguished_name = dn\n\n[dn]\nC = &lt;country_code&gt;\nST = &lt;state&gt;\nL = &lt;locality&gt;\nO = &lt;organisation name&gt;\nCN = %HOSTNAME%\n\n[v3_req]\nsubjectAltName = @alt_names\n\n[alt_names]\nDNS.1 = *.&lt;domain&gt;\n\n\n\nThis will be used to create an extended X.509 certificate with a\nsubjectAltName attribute, required by Chrome browsers.  The\nalt_names section may contain DNS.2 and DNS.3 and so on for as\nmany domain names that your web server is going to be answering for.\nThe %HOSTNAME% text can be left as-is, this will be substituted for\nthe real hostname below and allows for the process to be easily\nscripted.\n\nNow you have the ca.conf and server.conf files, you can use\nOpenSSL to generate the certificates you\nneed.  We will produce these files inside keys and certs\nsubdirectories, so create these first and keep them secure:\n\n$ mkdir -m 0700 keys certs\n\n\n\nNow either run the individual commands provided below, or you may\nshortcut the process by running this\ngenerate-ssl-cert script.\n\nCreate a new CA public/private key pair and X.509 certificate:\n\n$ root_key=keys/rootCA.key\n$ root_crt=certs/rootCA.crt\n$ root_days=730\n$ openssl req -x509 -config ca.conf \\\n        -newkey rsa:4096 -nodes -keyout $root_key -out $root_crt \\\n        -days $root_days\n\n\n\nYou can view the new certificate with:\n\n$ openssl x509 -in $root_crt -noout -text | less\n\n\n\nNow use OpenSSL to generate a new server private/public key pair and a\ncertificate signing request (CSR):\n\n$ hostname=&lt;fully-qualified hostname&gt;\n$ host_key=\"keys/$hostname.key\"\n$ host_csr=\"certs/$hostname.csr\"\n$ openssl req -new -reqexts v3_req \\\n        -config &lt;(sed \"s/%HOSTNAME%/$hostname/\" server.conf) \\\n        -newkey rsa:4096 -nodes -keyout \"$host_key\" -out \"$host_csr\"\n\n\n\nFinally, sign the CSR with the CA private key and generate the server\ncertificate:\n\n$ host_crt=\"certs/$hostname.crt\"\n$ host_days=365\n$ openssl x509 -req -in \"$host_csr\" -extensions v3_req \\\n        -extfile &lt;(sed \"s/%HOSTNAME%/$hostname/\" server.conf)  \\\n        -CA $root_crt -CAkey $root_key -CAcreateserial \\\n        -days $host_days -out \"$host_crt\"\n\n\n\nYou can view the new certificate with:\n\n$ openssl x509 -in $host_crt -noout -text | less\n\n\n\nMove the server key and certificate into the locations specified in\nthe web server configuration, and start/restart the web server.\n\nImport CA root certificate into Chrome browser\n\nNow you need to add the self-signed CA root certificate (rootCA.crt)\ninto your test Chrome web browser.  Open\nchrome://settings/certificates,\nclick on Authorities and then Import.  Locate the CA certificate\nand when prompted for the trust settings, check all of the available\nboxes.\n\nConfirm that you can view the web server’s index.html document over\nHTTPS.  Chrome shouldn’t complain about the SSL certificate not being\ntrusted, there should be a closed padlock symbol to the left of the\nURL in the address bar.  If you click on the padlock symbol, it should\nsay in green: Connection is secure.\n\nSet up web server documents (CRX/XML)\n\nYou will need to place the CRX file (packed extension) you created\nearlier into the web server’s documents directory.  You will also need\nto create an XML file that describes the location of the CRX file,\nlike this, which you also place on the web server:\n\n&lt;?xml version='1.0' encoding='UTF-8'?&gt;\n&lt;gupdate xmlns='http://www.google.com/update2/response' protocol='2.0'&gt;\n  &lt;app appid='&lt;extension ID&gt;'&gt;\n    &lt;updatecheck codebase='https://&lt;fully-qualified hostname&gt;/&lt;filename&gt;.crx' version='1.1' /&gt;\n  &lt;/app&gt;\n&lt;/gupdate&gt;\n\n\n\nAt the time of writing, the Linux\nhosting\npage was erroneously quoting that the gupdate tag in this XML\ndocument should refer to an https URL.  This is not true.  The\ngupdate tag must use the http URL as above.  This URL is not\nactually followed by the browser but is only used as a hint to the\nparser about the XML structure, as seen here in the Chromium source\ncode.\n\nNow you need to edit the manifest.json file inside your Chrome\nextension and add the following key which points to your XML file:\n\n\"update_url\": \"https://&lt;fully-qualified hostname&gt;/&lt;filename&gt;.xml\",\n\n\n\nRe-pack your extension with the updated manifest to the .crx file,\nremembering to use the .pem file from earlier so that the extension\nID remains the same, and copy into place on the web server.  If you\nforget to use the .pem file then a new public/private key pair is\ngenerated and as the extension ID is\ncomputed from the public key\nthe ID would change as a result, which is generally not what you\nwant.\n\nChrome enterprise policies\n\nTo allow your extension to be installed manually, or to have it\nforcibly installed, you will need to set the appropriate\npolicies.\n\nExtensionInstallSources must be configured with URLs or wildcards\nmatching the web address where the extension is hosted as well as the\nweb address that contains the link to the extension if a user is\nexpected to click on a link to install it (the referrer), e.g.\n\n  \"ExtensionInstallSources\": [\n    \"https://&lt;fully-qualified hostname&gt;/*\",\n    ...\n  ],\n\n\n\nThis caught me out for a while as the documentation made no mention of\nit, but you will not be able to install an extension by typing in, or\ncopying and pasting, the URL of the .crx file into the browser’s\naddress bar.  Without the referrer URL in this policy you won’t be able\nto install the extension by clicking on a link.\n\nYou may wish to put a * in your ExtensionInstallBlacklist for\nordinary users which disables the Load unpacked button in\nchrome://extensions.  If\nExtensionInstallBlacklist contains a * or any wildcard that would\nend up blacklisting the URL of your internal extension, then you must\nexplicitly permit your extension ID in the\nExtensionInstallWhitelist, e.g.\n\n  \"ExtensionInstallWhitelist\": [\n    \"&lt;extension ID&gt;\",\n    ...\n  ],\n\n\n\nTo forcibly install your extension you may add it to the\nExtensionInstallForcelist policy.  This policy line must point to\nthe .xml file (not the .crx file), e.g.\n\n  \"ExtensionInstallForcelist\": [\n    \"&lt;extension ID&gt;;https://&lt;fully-qualified hostname&gt;/&lt;filename&gt;.xml\",\n    ...\n  ],\n\n\n\nTo confirm that the web browser has the expected policy configuration,\nyou can view the current policy settings at\nchrome://policy.\n\nIf you are using the ExtensionInstallForcelist policy to install\nyour extension, note that the moment you remove your extension ID from\nthat policy it should be automatically removed from the browser.\n\nChrome policies per user on Linux\n\nIf you need to vary the Chrome web browser policy files by user on\nLinux, you’ll quickly discover that Chrome does not support\nthis.  However,\nit is possible to achieve this using /etc/namespace.conf, otherwise\nknown as polyinstantiated\ndirectories.\n\nLet’s say your policy file is called\n/etc/opt/chrome/policies/managed/my_policy.json.  With\npolyinstantiated directories, it is possible to provide a particular\ntailored version of that file by user, as the PAM session module can\noverlay the directory according to a set of rules.\n\nTo do this, first create a directory where the source files live.  For\ntesting purposes, I put this under /etc/opt/chrome/policies/users.\nRun these commands as the root user:\n\n$ cd /etc/opt/chrome/policies\n$ mkdir -m 0 users\n$ mkdir users/my_user\n$ cp managed/my_policy.json users/my_user\n\n\n\nThe permissions on the parent directory have to be 000, as required\nby pam_namespace(8).\n\nNow edit /etc/opt/chrome/policies/users/my_user/my_policy.json to\ncontain the specific changes required for the user.\n\nLastly, configure pam_namespace to map this directory over the top\nof the original directory when that specific user logs in.  This is\ndone by appending the following line to\n/etc/security/namespace.conf.  Warning!  Before you do this make\nsure you have a terminal window open as root on your test host so you\ndon’t accidentally lock yourself out if anything goes wrong!\n\n/etc/opt/chrome/policies/managed    /etc/opt/chrome/policies/users/ user    ~my_user\n\n\n\nThe fields are delimited by whitespace.  The first field is the target\ndirectory that will be replaced.  The second field locates where the\nuser-specific directories originate from.  The third field specifies\nthat the username should be appended to the second field to find the\nsource directory.  The fourth field starts with ~ and is a\ncomma-separated list of all users this rule applies to.\nAlternatively, without the ~ prefix, this can be a comma-separated\nlist of all users the rule does not apply to.\n\nNow when I open another terminal window and login, as pam_namespace is\nalready configured in the PAM stack, I see that\n/etc/opt/chrome/policies/managed/my_policy.json contains my\nuser-specific modification.\n\nIf this is not working as expected, check that all of the appropriate\nfiles in /etc/pam.d are configured to require pam_namespace.so\nlike this:\n\nsession    required     pam_namespace.so\n\n\n\nAlso watch out for incorrect syntax in /etc/security/namespace.conf.\nThe directory in the first field must exist already and the second\nfield must end with a slash.  If anything is wrong, the user won’t be\nable to login at all!  In this event, you’ll not see much in\n/var/log/messages:\n\n2020-04-17T18:09:54.297987+00:00 my_host systemd[1]: Started Session 679 of user my_user.\n2020-04-17T18:09:54.298292+00:00 my_host systemd-logind[2345]: New session 679 of user my_user.\n2020-04-17T18:09:54.400902+00:00 my_host systemd-logind[2345]: Removed session 679.\n\n\n\n… but you should find something useful in /var/log/secure, for\nexample:\n\n# grep pam_namespace /var/log/secure\n2020-04-17T17:55:01.344288+00:00 my_host sshd[24295]: pam_namespace(sshd:session): Mode of inst parent /etc/opt/chrome/policies not 000 or owner not root\n2020-04-17T17:56:13.826555+00:00 my_host sshd[24583]: pam_namespace(sshd:session): Mode of inst parent /etc/opt/chrome/policies not 000 or owner not root\n2020-04-17T17:56:20.114737+00:00 my_host sshd[24608]: pam_namespace(sshd:session): Mode of inst parent /etc/opt/chrome/policies not 000 or owner not root\n2020-04-17T17:57:38.896329+00:00 my_host sshd[24867]: pam_namespace(sshd:session): Error stating /etc/opt/chrome/policies/managed/test: No such file or directory\n2020-04-17T18:08:39.280280+00:00 my_host sshd[27049]: pam_namespace(sshd:session): Error stating /etc/opt/chrome/policies/managed/test: No such file or directory\n\n\n\nIf you’re really stuck, you can add the debug argument after\npam_namespace.so in the appropriate /etc/pam.d configuration file,\nwhich adds more verbose logging to /var/log/secure.\n\nSummary\n\nIn summary, the main points to focus on in order to support installing\nChrome extensions on Linux from an internal web server instead of the\nChrome Web Store are:\n\n\n  Google make it intentionally difficult to host Chrome extensions on\nan internal web server, I presume for security reasons.\n  There is about one error you’ll ever get from Chrome when trying to\ninstall an extension from an internal web server and something isn’t\nconfigured right: CRX_REQUIRED_PROOF_MISSING.\n  Set-up a web server such as nginx to run an instance on port 443 for\ntesting using a test SSL certificate signed with a self-signed CA\ncert that you import into Chrome as a trusted certificate.\n  The manifest.json file inside the Chrome extension must have an\nupdate_url parameter pointing to an XML file hosted on the\ninternal web server.\n  The packed extension format changed from CRX2 to CRX3 in 2019 so\nmany tools found on the web no longer work.  Use chrome\n--pack-extension to do this on the command-line.\n  The XML file contains the extension ID, which is derived from the\npublic key that accompanies the CRX file.  Contrary to currently\navailable documentation, the gupdate tag in the XML file must have\nthe exact URL http://www.google.com/update2/response and not use\nhttps.\n  The web server must use the correct MIME type for CRX files:\napplication/x-chrome-extension.\n  If you need to vary the Chrome policy file for different users, you\nmust use polyinstantiated directories to achieve this as Chrome does\nnot offer OS user level policies on Linux.\n  You cannot type in or copy/paste the URL of a CRX file into the\nbrowser’s address bar, you must instead click a link provided on a\nweb page and that website must be permitted in the\nExtensionInstallSources policy.\n\n",
        "url"      : "https://blog.janestreet.com/chrome-extensions-finding-the-missing-proof/",
        "image"    : "https://blog.janestreet.com/chrome-extensions-finding-the-missing-proof/magnifying-glass.png",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Watch all of Jane Street's tech talks",
        "date"     : "February 20, 2020",
        "authorId" : "jsomers",
        "author"   : "James Somers",
        "tags"     : [],
        "minsToRead" : 0,
        "content"  : "Jane Street has been posting tech talks from internal speakers and\ninvited guests for years—and they’re all available on our YouTube\nchannel:\n\nJane Street Tech Talks - YouTube\n\nSome of our favorites from over the years:\n\n\n  How to Build an Exchange, Brian Nigito\n  Unboxed Types for OCaml, Stephen Dolan\n  How Jane Street Does Code Review\n  Seven Implementations of Incremental, Yaron Minsky\n  RustBelt: Logical Foundations for the Future of Safe Systems\nProgramming, Derek Dreyer\n  The Algorithm for Precision\nMedicine, Matt Might\n\n\nWe have new talks regularly, and we’ll be adding them as they happen,\nso take a moment to subscribe!\n",
        "url"      : "https://blog.janestreet.com/watch-all-of-jane-streets-tech-talks/",
        "image"    : "https://blog.janestreet.com/watch-all-of-jane-streets-tech-talks/youtube-techtalks.jpg",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Troubleshooting systemd with SystemTap",
        "date"     : "February 3, 2020",
        "authorId" : "mbannister",
        "author"   : "Mark R. Bannister",
        "tags"     : [],
        "minsToRead" : 18,
        "content"  : "When we set up a schedule on a computer, such as a list of commands to\nrun every day at particular times via Linux cron\njobs, we\nexpect that schedule to execute reliably.  Of course we’ll check the\nlogs to see whether the job has failed, but we never question whether\nthe cron daemon itself will function.  We always assume that it will,\nas it always has done; we are not expecting mutiny in the ranks of the\noperating system.\n\nSo when we had recently experienced problems with Linux logins and\ncron jobs failing for no apparent reason, this came as a big\nsurprise.  It was as if the cron daemon had gone on vacation.  There\nwas something deeply disconcerting about such a foundational piece of\ninfrastructure suddenly becoming unreliable.\n\nThere was a common fingerprint that accompanied these events, a clue\ndiscovered in /var/log/messages:\n\nsystemd[1]: Failed to propagate agent release message: No buffer space available\nsystemd-logind[2781]: Failed to abandon session scope: No buffer space available\nsystemd-logind[2781]: Failed to start session scope session-19928.scope: No buffer space available\ndbus[1467]: [system] Rejected: destination has a full message queue, 0 matched rules; type=\"signal\", sender=\":1.0\" (uid=0 pid=1 comm=\"/usr/lib/systemd/systemd --switched-root --system \") interface=\"org.freedesktop.DBus.Properties\" member=\"PropertiesChanged\" error name=\"(unset)\" requested_reply=\"0\" destination=\"org.freedesktop.DBus\" (uid=0 pid=2781 comm=\"/usr/lib/systemd/systemd-logind \")\n\n\n\nSystemd is a daemon responsible for starting and\ntracking Linux processes, among other things, which it does by\ndefining units for every item of interest, each of which has a\ncollection of properties and dependency\ninformation. D-Bus\nis a message bus system, designed to make it easier for different\nLinux components to communicate with one another. How these\ncomponents related to cron was unclear, but we discovered a trail\nleading back to the Linux\nautomounter,\nwhich is responsible for mounting directories automatically as users\ntraverse more deeply into a directory hierarchy.\n\nWe observed that if a lot of automounted directories were mounted in\nquick succession (known in the trade as a “mount storm”), systemd\nwould choke. This was discussed\nupstream, where the\nadvice was that the systemd buffer space should be increased\n– and indeed it had been very\nrecently.\n\nWith this information, we had a fix in hand. But there were still\nsome intriguing questions that had not yet been answered:\n\n\n  What is this buffer space and why did it run out?\n  How close are we to the limit?\n  By increasing the buffer’s size, are we just kicking the can down\nthe road?\n\n\nTurning on the tap\n\nI decided to investigate, so began by turning to\nSystemTap, a diagnostic tool\nfor instrumenting and inspecting parts of Linux that traditional tools\ncannot reach. Among other things, SystemTap allows you to place probes\nin a kernel module that will fire when particular functions run. From\nthere, you can query the function parameters or the return value and\nwalk the stack or display a stack trace. These scripts can be launched\non a running system and can query already-running processes.\n\nWith SystemTap, all I needed was a way to reproduce the original\ncondition. I found that listing the contents of thousands of\nautomounted directories would occasionally cause the issue.\n\nThe notes in systemd issue\n#13674 indicated\nthat the problem related to the BUS_RQUEUE_MAX or BUS_WQUEUE_MAX\nsettings in systemd. A quick rg in the systemd source code turned up\nthat these C macros were used to check against the bus-&gt;rqueue_size\nand bus-&gt;wqueue_size buffers, where bus was an sd_bus structure:\n\n$ rg 'BUS_[RW]QUEUE_MAX'\nsrc/libsystemd/sd-bus/sd-bus.c\n1677:        if (bus-&gt;rqueue_size &gt;= BUS_RQUEUE_MAX)\n1785:                if (bus-&gt;wqueue_size &gt;= BUS_WQUEUE_MAX)\n\nsrc/libsystemd/sd-bus/bus-internal.h\n329:#define BUS_WQUEUE_MAX (192*1024)\n330:#define BUS_RQUEUE_MAX (192*1024)\n\n\n\nNext, I found all of the functions that changed these queue sizes,\nand instrumented them with a stap script. Confirming that the\nfunction parameters I needed were available:\n\n# stap -L 'process(\"/usr/lib/systemd/systemd\").function(\"bus_send_internal\").return'\nprocess(\"/usr/lib/systemd/systemd\").function(\"bus_send_internal@src/libsystemd/sd-bus/sd-bus.c:1718\").return $return:int $bus:sd_bus* $_m:sd_bus_message* $cookie:uint64_t* $hint_sync_call:_Bool $m:sd_bus_message* $__PRETTY_FUNCTION__:char[] const\n\n\n\nI could display the size of the queues and the return codes with a\nscript I wrote, systemd-bus.stp.\n\n# stap systemd-bus.stp 5000\n\n\n\n5000 is the number of milliseconds to wait between re-displaying each\nheader. Here’s the output from around the time the initial problem was\nreproduced:\n\n   PID FUNCTION                  RQUEUE WQUEUE RETURN\n     1 bus_send_internal              0 196601 1\n     1 bus_send_internal              0 196602 1\n     1 bus_send_internal              0 196603 1\n     1 bus_send_internal              0 196604 1\n     1 bus_send_internal              0 196605 1\n     1 bus_send_internal              0 196606 1\n     1 bus_send_internal              0 196607 1\n     1 bus_send_internal              0 196608 ENOBUFS\n--- Wed Nov 20 09:02:05 2019 GMT ---\n   PID FUNCTION                  RQUEUE WQUEUE RETURN\n     1 bus_send_internal              0 196608 ENOBUFS\n--- Wed Nov 20 09:02:15 2019 GMT ---\n   PID FUNCTION                  RQUEUE WQUEUE RETURN\n     1 bus_send_internal              0 196608 ENOBUFS\n--- Wed Nov 20 09:02:25 2019 GMT ---\n   PID FUNCTION                  RQUEUE WQUEUE RETURN\n     1 bus_send_internal              0 196608 ENOBUFS\n--- Wed Nov 20 09:02:35 2019 GMT ---\n   PID FUNCTION                  RQUEUE WQUEUE RETURN\n     1 bus_send_internal              0 196608 ENOBUFS\n--- Wed Nov 20 09:02:45 2019 GMT ---\n   PID FUNCTION                  RQUEUE WQUEUE RETURN\n     1 bus_send_internal              0 196608 ENOBUFS\n--- Wed Nov 20 09:02:55 2019 GMT ---\n   PID FUNCTION                  RQUEUE WQUEUE RETURN\n     1 bus_send_internal              0 196608 ENOBUFS\n     1 dispatch_rqueue                0  53865 -\n     1 bus_rqueue_make_room           0  53865 0\n     1 bus_socket_make_message        0  53865 1\n     1 dispatch_rqueue                0  53849 -\n     1 bus_rqueue_make_room           0  53849 0\n     1 bus_socket_make_message        0  53849 1\n     1 bus_send_internal              0  53849 1\n     1 bus_rqueue_make_room           0  53850 0\n     1 bus_socket_make_message        0  53850 1\n     1 bus_rqueue_make_room           1  53850 0\n     1 bus_socket_make_message        1  53850 1\n     1 bus_rqueue_make_room           2  53850 0\n     1 bus_socket_make_message        2  53850 1\n--- Wed Nov 20 09:03:05 2019 GMT ---\n   PID FUNCTION                  RQUEUE WQUEUE RETURN\n     1 bus_rqueue_make_room           3      0 0\n     1 bus_socket_make_message        3      0 1\n     1 sd_bus_call                    0  53849 1\n     1 bus_send_internal              3      0 1\n     1 sd_bus_call                    3      0 1\n     1 dispatch_rqueue                3      0 -\n     1 dispatch_rqueue                2      0 -\n     1 dispatch_rqueue                1      0 -\n     1 bus_send_internal              0      0 1\n     1 bus_rqueue_make_room           0      0 0\n     1 bus_socket_make_message        0      0 1\n     1 sd_bus_call                    0      0 1\n\n\n\nWhen the problem occurs, it is the write queue (WQUEUE) that fills\nup, as seen just before 09:02:05 when it reaches 196,608 items, after\nwhich the bus_send_internal function returns ENOBUFS\ninstead. About 1 minute later the number of items in the write queue\ndropped down to zero again, so something eventually drained the queue.\n\nHowever, I was finding it difficult to reproduce this and I did not\nknow why. If I was lucky, I was able to observe this behaviour at most\nonce a day. When I did reproduce the problem and collected the output\nabove, in another window an attempt to log in to the host hung until\nthe size of the queue dropped, at which point I observed the following\nerror in the messages log:\n\nsystemd-logind[2184]: Failed to start session scope session-750.scope: Connection timed out\n\n\n\nTaking a closer look\n\nSystemTap allowed me to identify precisely which queue was filling up and\nwatch in real-time while it did.\n\n\nSystemTap allowed me to identify precisely which queue was filling up and\nwatch in real-time while it did.\n\n\nHowever, I wanted to know what data was going into the queue and what\nwas responsible for draining it. That would help me understand the\nrelationship with the automounter – and also help explain why I was\nseeing this issue only sporadically, not every time I traversed a\ntree of automounted directories.\n\nExamining the source code I could see that the bus_send_internal()\nfunction accepted a pointer to a structure called sd_bus_message.\nThe definition of that structure begins with the following members:\n\nstruct sd_bus_message {\n        unsigned n_ref;\n\n        sd_bus *bus;\n\n        uint64_t reply_cookie;\n\n        const char *path;\n        const char *interface;\n        const char *member;\n        const char *destination;\n        const char *sender;\n...\n\n\n\nWith some trial-and-error, I found that interface, member and\npath contained the pertinent information.  To display this data\nrequired writing the following probe:\n\nprobe process(\"/usr/lib/systemd/systemd\").function(\"bus_send_internal\").return\n{\n    printf(\"%-40s %-25s %s\\n\",\n\t\tuser_string_quoted(@entry($_m-&gt;interface)),\n            \tuser_string_quoted(@entry($_m-&gt;member)),\n            \tuser_string_quoted(@entry($_m-&gt;path)))\n}\n\n\n\nThe probe output a screenful of information, which began:\n\n\"org.freedesktop.systemd1.Manager\"       \"UnitNew\"                 \"/org/freedesktop/systemd1\"\n\"org.freedesktop.systemd1.Manager\"       \"UnitRemoved\"             \"/org/freedesktop/systemd1\"\n\"org.freedesktop.systemd1.Manager\"       \"UnitNew\"                 \"/org/freedesktop/systemd1\"\n\"org.freedesktop.DBus.Properties\"        \"PropertiesChange\\260\"    \"/org/freedesktop/systemd1/unit/usr_2dmount_2dapp_2dbucket1_2emount\"\n\"org.freedesktop.DBus.Properties\"        \"PropertiesChange\\260\"    \"/org/freedesktop/systemd1/unit/usr_2dmount_2dapp_2dbucket1_2emount\"\n\"org.freedesktop.DBus.Properties\"        \"PropertiesChange\\260\"    \"/org/freedesktop/systemd1/unit/usr_2dmount_2dapp_2dbucket2_2emount\"\n\"org.freedesktop.DBus.Properties\"        \"PropertiesChange\\260\"    \"/org/freedesktop/systemd1/unit/usr_2dmount_2dapp_2dbucket2_2emount\"\n\"org.freedesktop.DBus.Properties\"        \"PropertiesChanged\"       \"/org/freedesktop/systemd1/unit/usr_2dmount_2dapp_2dbucket3_2emount\"\n\"org.freedesktop.DBus.Properties\"        \"PropertiesChanged\"       \"/org/freedesktop/systemd1/unit/usr_2dmount_2dapp_2dbucket3_2emount\"\n...\n\n\n\nThis made me suspect that systemd units were being set up for each\nautomounted directory, as I recognised the\n/usr_2dmount_2dapp_2dbucket1 text as one of our mount-points:\n/usr/mount/app/bucket1.  A quick check confirmed this:\n\n$ systemctl list-units | grep '\\.mount'\n-.mount                                      loaded active mounted   /\ndev-hugepages.mount                          loaded active mounted   Huge Pages File System\ndev-mqueue.mount                             loaded active mounted   POSIX Message Queue File System\nhome-mark.mount                              loaded active mounted   /home/mark\nusr-mount-app-bucket1.mount                  loaded active mounted   /usr/mount/app/bucket1\nusr-mount-app-bucket2.mount                  loaded active mounted   /usr/mount/app/bucket2\nusr-mount-app-bucket3.mount                  loaded active mounted   /usr/mount/app/bucket3\nusr-mount-app-bucket4.mount                  loaded active mounted   /usr/mount/app/bucket4\nusr-mount-app-bucket5.mount                  loaded active mounted   /usr/mount/app/bucket5\nrun-user-0.mount                             loaded active mounted   /run/user/0\nrun-user-100.mount                           loaded active mounted   /run/user/100\nrun-user-400.mount                           loaded active mounted   /run/user/400\nsys-kernel-config.mount                      loaded active mounted   Configuration File System\nsys-kernel-debug.mount                       loaded active mounted   Debug File System\ntmp.mount                                    loaded active mounted   /tmp\nvar-lib-nfs-rpc_pipefs.mount                 loaded active mounted   RPC Pipe File System\nvar-lib-sss-db.mount                         loaded active mounted   Mount SSSD cache to tmpfs\nvar.mount                                    loaded active mounted   /var\n\n$ systemctl show home-mark.mount | wc -l\n117\n\n\n\nAutomounted directories – specifically those that were being mounted\nautomatically by /usr/sbin/automount – were ending up as systemd\nunits.  Picking one example (an automounted home directory) revealed\nit contained 117 properties.  This was not just an unexpectedly large\nnumber – it was surprising to me that systemd would have a unit\ndefined for /home/mark at all.  The automounter was responsible for\nmounting and unmounting it; why did systemd need to know anything\nabout it?\n\nThere are actually two distinct systems in Linux that can handle\nautomounting and they function independently of each other: the\ntraditional automounter,\nwhich we were using, and the systemd\nautomounter. Searching\nthrough the automounter source code for anything related to systemd\nrevealed nothing. I could not figure out how or where the automounter\nwas sending details of each of its mounts to systemd, nor could I\nguess at the purpose for this exchange of information.\n\n“Use the Source, Luke!”\n\n\n... a masterful debugger, able to find the root cause ...\nno matter how deep the problem was hiding.\n\n\nMy mind was cast back to the words of an old colleague who was a\nmasterful debugger, able to find the root cause of almost any\nLinux-related problem, no matter how deep into user-space, the kernel,\nor machine code the problem was hiding.  “Use the Source, Luke!” he\nwould say, conjouring images of the Jedi Order.\n\nI continued to examine the systemd source code and it wasn’t long\nbefore I encountered the mount_sigchld_event() function, and this\ncomment:\n\n/* So here's the thing, we really want to know before /usr/bin/mount or /usr/bin/umount exit whether\n * they established/remove a mount. This is important when mounting, but even more so when unmounting\n * since we need to deal with nested mounts and otherwise cannot safely determine whether to repeat\n * the unmounts. In theory, the kernel fires /proc/self/mountinfo changes off before returning from\n * the mount() or umount() syscalls, and thus we should see the changes to the proc file before we\n * process the waitid() for the /usr/bin/(u)mount processes. However, this is unfortunately racy: we\n *  have to waitid() for processes using P_ALL (since we need to reap unexpected children that got\n * reparented to PID 1), but when using P_ALL we might end up reaping processes that terminated just\n * instants ago, i.e. already after our last event loop iteration (i.e. after the last point we might\n * have noticed /proc/self/mountinfo events via epoll). This means event loop priorities for\n * processing SIGCHLD vs. /proc/self/mountinfo IO events are not as relevant as we want. To fix that\n * race, let's explicitly scan /proc/self/mountinfo before we start processing /usr/bin/(u)mount\n * dying. It's ugly, but it makes our ordering systematic again, and makes sure we always see\n * /proc/self/mountinfo changes before our mount/umount exits. */\n(void) mount_process_proc_self_mountinfo(u-&gt;manager);\n\n\n\nThis was the link in the puzzle I had been missing!  Systemd watches\n/proc/self/mountinfo for changes.  When changes are detected, it\nsets up systemd units by calling mount_setup_unit(). This was a very\ninteresting discovery; I had not been aware of this interaction\nbefore.  Checking with the upstream maintainer for the Linux\nautomounter, it seemed that this interaction also surprised him.\n\nWhen the Linux kernel mounts or unmounts a filesystem, it updates a\ntable of mount-points which can be viewed in a file in the /proc\npseudo filesystem.  Systemd was watching changes to this\nfile – /proc/self/mountinfo – creating units for itself for\neverything that was mounted.  So there was no direct link between\nsystemd and the automounter, and also this problem was not specific to\nautomounting. The same thing would happen when mounting any\nfilesystem, even by using the mount command directly.\n\nLooking at a systemd unit for an automounted directory, my attention\nwas drawn to the Conflicts attribute:\n\n$ systemctl show -p Conflicts home-mark.mount\nConflicts=umount.target\n\n\n\nFrom systemd.special(7):\n\numount.target\n    A special target unit that umounts all mount and automount points \n    on system shutdown.\n\n    Mounts that shall be unmounted on system shutdown shall add\n    Conflicts dependencies to this unit for their mount unit, which is \n    implicitly done when DefaultDependencies=yes is set (the default).\n\n\n\nsystemd would seem to want to be responsible for unmounting targets that were\nadded by the Linux automounter.  From systemd.mount(5):\n\nMount points created at runtime (independently of unit files or\n/etc/fstab) will be monitored by systemd and appear like any other\nmount unit in systemd. See /proc/self/mountinfo description in proc(5).\n\n\n\nHowever, as the automounter is capable of unmounting its own targets\nand as umount -a at shutdown will also unmount targets, it remained\nunclear why systemd would need to know anything about mount points\nexcept for those that an administrator may have chosen to explicitly\nconfigure as systemd units for dependency resolution.\n\nWhat is draining the queue?\n\nHaving determined that systemd was creating its own mount units, I shifted the\ninvestigation back to the question: why does this problem not happen all the\ntime?  Most of the time the queue stays at zero, suggesting that something is\ndraining it.  If I could understand what was draining the queue, perhaps\nI’d have a better idea why it would sometimes not drain the queue fast enough?\n\nThe bus_send_internal() function in systemd ends up with a write to file\ndescriptor 45, as I discovered with this SystemTap probe:\n\nprobe process(\"/usr/lib/systemd/systemd\").function(\"bus_socket_write_message\") {\n    printf(\"%d %d\\n\", pid(), $bus-&gt;output_fd)\n}\n\n\n\nThis could also be seen in the output of strace on PID 1.  So where does fd 45\ngo? Who is reading from the other end?\n\n# ls -l /proc/1/fd/45\nlrwx------ 1 root root 64 Nov 20 16:37 /proc/1/fd/45 -&gt; socket:[17738]\n\n# ss -xp | grep 17738\nu_str ESTAB 0 0 * 17738 * 23294 users:((\"systemd\",pid=1,fd=45))\nu_str ESTAB 0 0 /run/dbus/system_bus_socket 23294 * 17738 users:((\"dbus-daemon\",pid=1696,fd=9))\n\n\n\nAccording to an strace of the dbus-daemon, when it reads from fd 9 it writes to\nfd 10, so chasing that down:\n\n# ls -l /proc/1696/fd/{9,10}\nlrwx------ 1 root root 64 Nov 22 13:07 /proc/1696/fd/10 -&gt; socket:[23452]\nlrwx------ 1 root root 64 Nov 22 13:07 /proc/1696/fd/9 -&gt; socket:[23294]\n\n# ss -xp | grep 23452\nu_str ESTAB 0 0 * 18029 * 23452 users:((\"systemd-logind\",pid=2267,fd=12))\nu_str ESTAB 0 0 /run/dbus/system_bus_socket 23452 * 18029 users:((\"dbus-daemon\",pid=1696,fd=10))\n\n\n\nWhen I mount a filesystem, systemd is using dbus\nto send messages to systemd-logind. Is that right?  Why?  And are they all going\nthere?\n\nTo dig deeper, I really wanted to see a count of how many messages\nwere sent out of the write socket from systemd vs. how many were\nreceived by systemd-logind on the read side of the socket. I wrote\nanother stap script, systemd-buffers-trace.stp.\n\nWhile running this script with stap, I mounted 2,148 directories via the\nautomounter and got the following output:\n\nsystemd/bus_send_internal() returned 1, 1339624 times\nsystemd/bus_socket_write_message() sent 1339624 messages -&gt; fd 45\nsystemd-logind/bus_socket_read_message() recv 2666566 messages &lt;- fd 12\nsystemd-logind/bus_socket_make_message() called 1333283 times\nsystemd-logind/bus_process_internal() returned 1, 1333283 times\n\n\n\nTo mount just over 2,000 directories, bus_socket_write_message()\nsent over 1.3 million messages and bus_socket_read_message()\nreceived over 2.6 million messages?!  That’s incredible!  I thought\nthat 2,000 directories was only a mount-storm-in-a-teacup, but having\nbeen expanded now to over a million messages, the teacup had clearly\nbroken into pieces on the floor.\n\nI added an extra probe to count the number of messages that had the same\ninterface, member and path components received by bus_process_object(),\nand then ran stap while mounting a single directory:\n\nsystemd/bus_send_internal() returned 1, 65 times\nsystemd/bus_socket_write_message() sent 65 messages -&gt; fd 45\nsystemd-logind/bus_socket_read_message() recv 126 messages &lt;- fd 12\nsystemd-logind/bus_socket_make_message() called 63 times\nsystemd-logind/bus_process_internal() returned 1, 63 times\n\nTop 20 messages handled by bus_process_object()\n\nCOUNT INTERFACE MEMBER PATH\n4 \"org.freedesktop.DBus.Properties\" \"PropertiesChanged\" \"/org/freedesktop/systemd1/unit/usr_2dmount_2dapp_2dbucket1_2emount\"\n4 \"org.freedesktop.DBus.Properties\" \"PropertiesChanged\" \"/org/freedesktop/systemd1/unit/usr_2dmount_2dapp_2dbucket2_2emount\"\n4 \"org.freedesktop.DBus.Properties\" \"PropertiesChanged\" \"/org/freedesktop/systemd1/unit/usr_2dmount_2dapp_2dbucket3_2emount\"\n4 \"org.freedesktop.DBus.Properties\" \"PropertiesChanged\" \"/org/freedesktop/systemd1/unit/usr_2dmount_2dapp_2dbucket4_2emount\"\n4 \"org.freedesktop.DBus.Properties\" \"PropertiesChanged\" \"/org/freedesktop/systemd1/unit/usr_2dmount_2dapp_2dbucket5_2emount\"\n4 \"org.freedesktop.DBus.Properties\" \"PropertiesChanged\" \"/org/freedesktop/systemd1/unit/usr_2dmount_2dapp_2dbucket6_2emount\"\n4 \"org.freedesktop.DBus.Properties\" \"PropertiesChanged\" \"/org/freedesktop/systemd1/unit/home_2dmark_2emount\"\n4 \"org.freedesktop.DBus.Properties\" \"PropertiesChanged\" \"/org/freedesktop/systemd1/unit/dev_2dmapper_2dvg01_5cx2ddata_2edevice\"\n4 \"org.freedesktop.DBus.Properties\" \"PropertiesChanged\" \"/org/freedesktop/systemd1/unit/dev_2dmapper_2dvg01_5cx2dvar_2edevice\"\n4 \"org.freedesktop.DBus.Properties\" \"PropertiesChanged\" \"/org/freedesktop/systemd1/unit/dev_2dmapper_2dvg01_5cx2dstate_2edevice\"\n4 \"org.freedesktop.DBus.Properties\" \"PropertiesChanged\" \"/org/freedesktop/systemd1/unit/dev_2dmapper_2dvg01_5cx2dtmp_2edevice\"\n4 \"org.freedesktop.DBus.Properties\" \"PropertiesChanged\" \"/org/freedesktop/systemd1/unit/dev_2dmapper_2dvg01_5cx2droot_2edevice\"\n...\n\n\n\nAt first I couldn’t understand this. I mounted a single directory; systemd\nwould have picked up on that because it is watching changes to\n/proc/self/mountinfo and would want to create a new mount unit; but it\nresulted in messages for all the other current mount points on the\nsystem (in fact four messages per mount point).\n\nThen the penny dropped.  While monitoring /proc/self/mountinfo for\nchanges, when anything changed in the mount table then systemd\nre-communicated every single mount point over the bus to\nsystemd-logind and repeated this each time any single entry\nchanged. So if we’re mounting over 2,000 directories, and these will\nbe done one at a time, then each time a new mount point from the list\nof 2,000 was added, the entire current list of mount points was\ncommunicated again over the bus – and, of course, the list of\ncurrent mount points was growing each time.  That’s why we were seeing\nmillions of messages on the bus. A mount storm had produced a tsunami\nof IPC messages that was swamping the system.\n\nSummary\n\nUnder normal operating conditions, the systemd-logind process manages to read\nall of the messages destined for it on its channel in the message bus\nfast enough.  However, if systemd-logind happens to be busy at the time\nprocessing other messages, the queue fills up and anything that relies\non the systemd bus chokes. This is what caused our logins and cron\njobs to fail.  During the investigation, I discovered that mounting\njust over 2,000 directories via the Linux automounter caused systemd to\nproduce millions of messages over dbus.\n\nThere is a partial\nfix for this issue\nupstream that attempts to address the problem by a form of rate\nlimiting. Unfortunately, it was backed out from the upstream master\nbranch and apparently needs some work.  I have proposed a way\nforward and hope\nthat a final solution will be forthcoming in the not-too-distant\nfuture.\n",
        "url"      : "https://blog.janestreet.com/troubleshooting-systemd-with-systemtap/",
        "image"    : "https://blog.janestreet.com/troubleshooting-systemd-with-systemtap/data-taps.jpg",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Using Python and OCaml in the same Jupyter notebook",
        "date"     : "December 16, 2019",
        "authorId" : "lmazare",
        "author"   : "Laurent Mazare",
        "tags"     : [],
        "minsToRead" : 14,
        "content"  : "\nThe cover image is based on Jupiter family by NASA/JPL.\n\n\nOCaml is an amazing programming language to write industrial strength libraries\nand systems. At Jane Street we use it for literally all of our production\nsystems, including for FPGA design,\nweb development,\nand even machine learning.\n\nHowever, for certain tasks we have found a different workflow to be\nhighly effective: using Python with its lightweight syntax and huge\necosystem of libraries (numerical analysis, plotting, machine learning\netc) inside a Jupyter notebook. This workflow\nis very convenient for iterating quickly, especially if the code is\nonly meant to be run once. This often happens in quantitative research\nwhere one wants for example to quickly load a time series from a csv\nfile, plot it, and compute some variance or correlation metrics using\nNumPy.\n\nHowever it is crucial for us to be able to reuse our existing OCaml\nservices and systems in this workflow. So we created a way to expose\nthe services of our OCaml systems to Python users. Importantly, we\nwant this to work in a way that the OCaml developers of those systems\ncan create the Python bindings without requiring a deep\nunderstanding of Python itself.  In our solution we provide\ntransparent access for Python users to these systems by building on\npyml which provides OCaml\nbindings to the Python C\nAPI.\n\nIn this blog post, we discuss how OCaml libraries can be called from Python as\nwell as the other way around. We leverage pyml to write bindings that wrap functions\nfrom one language so that they can be used in the other.\nWe introduce a ppx extension and a library to make writing such bindings easier.\nAnd finally we show how all this can be used to allow both Python and OCaml code\nto run in the same notebook and seamlessly exchange values between the two worlds.\nIn the screenshot below an OCaml function is used to evaluate a\nReverse Polish Notation\nexpression and this function is called from Python. You can try this in your web browser either using\nGoogle Colab,\nor with binder.\n\n\n\nCalling Python from OCaml\n\nFirst, let us look at how to call python code from OCaml. This is not our\nprimary use case but it nicely demonstrates the pieces involved.\n\nThe pyml library provides OCaml bindings to the Python C API.\nUsing these bindings, the OCaml code can start the Python runtime and\ninteract with it by building Python values or modules, calling methods, etc.\nBelow is a simple example using Python to concatenate some strings.\n\nlet () =\n  (* Initialize the Python runtime. *)\n  Py.initialize ();\n  (* Create a Python object for the string \"-\". *)\n  let sep = Py.String.of_string \"-\" in\n  let foobar =\n    (* Call the .join method on the sep object with a single\n       argument that is a list of two strings. *)\n    Py.Object.call_method\n      sep\n      \"join\"\n      [| Py.List.of_list_map Py.String.of_string [ \"foo\"; \"bar\" ] |]\n    (* Convert the result back to an OCaml string. *)\n    |&gt; Py.String.to_string\n  in\n  Printf.printf \"%s\\n\" foobar\n\n\n\nThe type for Python values is called pyobject. An OCaml string can be converted to such\nan object via Py.String.of_string. The resulting pyobject can be converted back to\nan OCaml string via Py.String.to_string.\n\nIf the argument given to this last function happens not to be a Python\nstring, an exception is raised at runtime. E.g. the following code\ncompiles correctly but raises an exception when run:\n\nlet () =\n  ignore (Py.String.to_string (Py.Int.of_int 42) : string)\n\n\nThe exception is raised in the Python runtime, caught by pyml, and converted to an OCaml\nexception. It is pretty useful in this context as it provides some details\nabout what went wrong.\nFailure \"Type mismatch: String or Unicode expected. Got: Long (42)\".\n\n\n\nIt is even possible to run some Python code using Py.Run.eval. This\nevaluates a string containing a Python expression and returns the result as a pyobject.\nFor example the following bit of OCaml code properly returns the integer 285.\nPy.Run.eval \"sum([n*n for n in range(10)])\" |&gt; Py.Int.to_int\n\n\n\nVarious other examples can be found in the\nreadme of the pyml GitHub repo.\n\nA PPX extension: ppx_python\n\nThe next problem is converting values between Python and\nOCaml. Converting simple values such as ints or strings is easy,\nhowever handling more complex types this way would be very cumbersome,\nso we wrote a PPX syntax extension to help automate this:\nppx_python.\n\nAnnotating an OCaml type t with [@@deriving python] results in two\nfunctions being automatically generated:\n\n\n  python_of_t: t -&gt; pyobject converts an OCaml value of type t into a Python object value.\n  t_of_python: pyobject -&gt; t converts a Python object value into a value of type t.\n\n\nThe conversion is straightforward for basic types such as int, float, bool, or string.\nunit is converted to None.\nOCaml tuples are converted into Python tuples. OCaml lists and arrays\nare converted to Python lists.\n\nFor OCaml options, None is used on the Python side to represent the None variant.\nOtherwise the value is directly available. Note that this makes the two OCaml values\n[Some None] and [None] indistinguishable on the Python side as both are represented\nusing None.\n\nRecords are represented using Python dictionaries whose keys are strings.\nThe [@python.default] attribute can be used on some of the fields to\nmake them optional on the Python side: when not present the default\nvalue gets used.\n\nFor example, one can write the following OCaml code:\ntype t =\n  { foo : int [@python.default 42]\n  ; bar : float\n  } [@@deriving python]\n\n\n\nThe Python dictionary { 'bar': 3.14 } would then be converted\nto the OCaml record { foo = 42; bar = 3.14 } and vice versa.\n\nCalling OCaml from Python\n\nWith these conversion functions in place, we wrote a small library\npythonlib on top of\npyml. The goal is to make writing python bindings to OCaml services\n(using OCaml!) as simple as possible.\n\nThe library has been heavily inspired by Core’s command-line processing module,\nsee this section\nof the Real World OCaml book for more details on Command.\nThe parameters are specified using an Applicative and we can use the let-syntax\nextension let%map_open from ppx_let to simplify the syntax.\n\nIn this example the OCaml code defines a Python function that takes as single\npositional argument n of integer type, the OCaml code then performs some\ncomputations based on n and returns the resulting float value.\nWe attach the function to a newly defined Python module named ocaml.\n\nopen Base\n\nlet approx_pi =\n  let%map_open.Python_lib n = positional \"n\" int ~docstring:\"the value n\"\n  in\n  let sum =\n    List.init n ~f:(fun i -&gt; let i = Float.of_int (1 + i) in 1.0 /. (i *. i))\n    |&gt; List.reduce_exn ~f:(+.)\n  in\n  Float.sqrt (sum *. 6.) |&gt; python_of_float\n\nlet () =\n  if not (Py.is_initialized ())\n  then Py.initialize ();\n  let mod_ = Py_module.create \"ocaml\" in\n  Py_module.set mod_ \"approx_pi\" approx_pi\n    ~docstring:\"computes a slowly convergent approximation of pi\"\n\n\n\nThis code is compiled to a shared library ocaml.so, together with a small C\nlibrary defining the PyInit_ocaml function that starts the ocaml runtime and\nexposes this module.\n\nWhen using Python, it is then possible to import the ocaml module and use the approx_pi\nfunction as long as the ocaml.so file can be found in the Python path.\n\nimport ocaml\nprint(ocaml.approx_pi(1000))\n\n\n\nThere are several advantages in using pythonlib:\n\n\n  The type of arguments are automatically checked and they get converted between the appropriate\nPython and OCaml types.\n  Calling the Python function with an incorrect number of arguments, or with improper argument names\nfor keyword arguments results in easy to understand runtime errors.\n  Documentation gets automatically generated and attached to the Python\nfunction, including the name of the parameters, their types, and some\nuser specified contents. This documentation is available when using\ncompletion in Jupyter.\n\n\npythonlib handles basic types such as int, float, string, list, etc, and is easy to extend to\nmore complex types, e.g. by using ppx_python.  Further examples can be found in the\nexamples directory\nof the GitHub repo.\n\nRunning Python and OCaml in the same notebook\n\nFinally to take it one step further, it is even possible to mix and\nmatch OCaml and Python freely in the same notebook.\n\nJupyter supports a surprisingly large amount of programming languages via\ndifferent kernels.\nA couple years back, a blog post\n“I Python, You R, We Julia”\nshowed how to allow for cross-language integration in Jupyter rather than just relying\non a single language per notebook.\n\nIn order to allow for evaluating OCaml expressions in a Python environment we wrote some\nbindings for the OCaml toploop module which is used by the OCaml Read-Eval-Print loop.\nWe expose two main functions in these\nbindings:\n\n\n  eval takes as input a string, parses it to a list of phrases and evaluates these\nphrases using Toploop.execute_phrase. OCaml exceptions are caught and converted to\na Python SyntaxError.\n  get takes as input one or two strings. When given two strings the second one represents\n some OCaml code, and the first one represents its expected type. This again\n parses and evaluates the string containing the OCaml code and\n converts the generated value using the provided type representation.\n If only one string is provided, the type representation is inferred by the OCaml compiler.\n\n\nIn the following example code, the call to toploop.eval results in some output being\nprinted. The call to toploop.get returns a Python function that takes as input a list\nof pairs and returns some formatted version of this list.\n\nfrom ocaml import toploop\n\ntoploop.eval('Printf.printf \"hello from ocaml\\n%!\";;')\nocaml_fn = toploop.get(\n    '(int * string) list -&gt; string list',\n    'List.map (fun (d, s) -&gt; Printf.sprintf \"%d: %s\" (d+1) (String.uppercase_ascii s))'\n\nprint(ocaml_fn([(3141592, 'first-line'), (2718281, 'second-line')]))\n\n\n\nIn order to make calling the OCaml bits easier when using Jupyter, we added some Jupyter\ncustom magics\nso that one can write OCaml cells, e.g.:\n%%ocaml\nlet rec fact x = if x = 0 then 1 else x * fact (x-1);;\n\nPrintf.printf \"fact(10) = %d\" (fact 10);;\n\n\n\nAnd also easily inline OCaml code in Python cells, e.g.\n# Returns a Python function wrapping the recursive factorial implementation defined above.\nf = %ocaml fact\nf(10) # Apply f to 10 in Python.\n\n\n\nThe docstring associated with the OCaml functions contains the OCaml type for this function.\nThis appears when using Jupyter’s completion.  Closures can also be passed to and returned\nby functions, e.g. the OCaml map function can be made available to Python via the following\nsnippet, a Python closure or function can be passed as the f argument.\n\nmap_fn = %ocaml fun (x, f) -&gt; List.map f x\nprint(map_fn([1, 2, 3], lambda x: x*x))\n\n\n\nWe’ve created a small pip package for the ocaml python module.  You\ncan install this using pip install ocaml. Once this is done the\nocaml module can be imported from Python.  You can even run this on\nGoogle Colab by using this\nnotebook\nor in\nbinder.\nAmong several examples the notebook includes a function computing the\nnumber of ways to place N queens on a checkerboard. Note that this\npackage is not currently very well polished but it should give some\nideas of what can be done through this Python-OCaml integration.\n\n\n\nWe also compared the OCaml and Python implementation on this N-queens problem. This is far\nfrom a realistic benchmark but still the OCaml version ends up being a lot faster\nthan the Python one. Note that with the toploop module, the OCaml code is evaluated\nby compiling to bytecode which is not optimal, switching to the opttoploop module\nthat generates native code should make it even faster.\n\n\n\nNext Steps\n\nWe have been successfully using both the Python-OCaml and OCaml-Python integration\ninternally at Jane Street for a couple months now. Making it easy for Python users\nto access OCaml services has been a big win for us.\n\nMost of our bindings wrap OCaml functions that rely on Async for various I/O\noperations. We currently block on such calls. However we plan on interfacing\nthis with the newly introduced Python \nasync/await syntax so\nthat OCaml and Python tasks can be run concurrently.\n\nA possible future use case would be to provide Python bindings for some of our\ncore OCaml libraries, for example to parse and handle s-expressions with\nsexplib.  Another interesting library\ncould be\nCore Time_ns\nwhich is used to represent timestamps (as the timestamp representation in\nPython is hard to wrap your head around).\n\nFinally when mixing Python and OCaml in the same notebook, custom OCaml types are\nnot handled when using the %ocaml magic. It is possible to go around this using\nsome toploop.add_named_type function but this is currently a bit brittle so\nwe certainly plan on improving this.\n",
        "url"      : "https://blog.janestreet.com/using-python-and-ocaml-in-the-same-jupyter-notebook/",
        "image"    : "https://blog.janestreet.com/using-python-and-ocaml-in-the-same-jupyter-notebook/python-ocaml.jpg",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Deep-Learning the Hardest Go Problem in the World",
        "date"     : "December 6, 2019",
        "authorId" : "dwu",
        "author"   : "David Wu",
        "tags"     : [],
        "minsToRead" : 13,
        "content"  : "Updates and a New Run\n\nFirst, some brief updates:\n\nEarlier this year, I posted about our project KataGo and research to improve self-play learning in Go, with an initial one-week run showing highly promising results.\n\nSeveral months later in June, KataGo performed a second, longer 19-day run with some major bugfixes and minor optimizations. Starting from scratch and with slightly less hardware than before, up to 28 V100 GPUs, it reached and surpassed the earlier one-week run in barely more than the first three days. By the end of the 19 days, it had reached the strength of ELF OpenGo, Facebook AI Research’s multi-thousand-GPU replication of one of AlphaZero’s runs - equating to roughly a factor of 50 reduction in computation required. Our paper has since been significantly revised and updated with the new results:\n\nAccelerating Self-Play Learning in Go (arxiv paper)\n\nThis version of KataGo has also been released to the Go player community for several months now. With the results of this second run, KataGo is comparable to other top-level open-source bots and is now one of the more popular bots online for its ability to play handicap games - to come up with strong moves even starting at a great disadvantage by secondarily optimizing for score, something that ELF and other “AlphaZero”-trained bots fail to do well [1]. KataGo is also popular for game analysis using some of the more major analysis GUIs also due to its ability to estimate the score, ownership, and to adjust to alternate board sizes and rules.\n\n\n\n\nRelative Elo rating (y-axis) of KataGo's run versus Leela Zero (LZ) and ELF, plotted by *rough* self-play computation required (x-axis, log scale). Note: Leela Zero is likely depicted quite over-favorably compared to either KataGo or ELF, but also a little under-favorably for the very earliest parts of their run due to a variety of technical details and differences in their run - see paper.\n\n\n\n\nThe Hardest Go Problem in the World?\n\nAlong the way, development and research in KataGo has spawned for us many new questions and side-branches to explore. One of these questions revolved around a challenge: understanding the “hardest” Go problem in the world.\n\nIgo Hatsuyoron is a classic problem collection dating back to 1713, compiled and much of it likely composed by Inoue Dosetsu Inseki, the head of the Inoue house of Go and holding the title of Meijin, the strongest recognized player in Japan. Igo Hatsuyoron is still widely-recognized today as one of the most difficult and high-caliber Go problem collections in existence.\n\nIn particular, the 120th problem in the collection is often considered to be the deepest and hardest Go problem ever composed. Its main line features a counterintuitive “hanezeki” formation, and buried within its depths are many variations involving enormous trades, knife-edge races between huge groups balanced with no margin for error, and subtle early choices that have effects as much as 100 moves later. Despite many total person-years of study and even a few published books devoted to analyzing the problem (for example: here at amazon.cn), the problem is still considered unsolved, with many classical solutions seemingly refuted and modern lines still not entirely certain.\n\n\n\n\nThe starting position of Igo Hatsuyoron 120. Black to play. If both sides play optimally, what is the result?\n\n\n\n\nAs far as we are aware, all prior bots including those developed since AlphaZero, almost completely fail to understand this problem. The highly unusual shapes cause them to completely miss many of the key moves and ideas, and the incredible precision of the fights prevents the generic knowledge that they have learned from being effective, the June version of KataGo included. In fact, even in ordinary games, long-distance/large-scale fights and blind spots of specific unusual shapes are known weaknesses of current otherwise superhuman bots[2], and sometimes humans can outdo them in those situations - and this problem hits both such weaknesses hard.\n\nCould a bot be trained to understand this problem?\n\nNormally, training consists of self-playing hundreds of thousands of ordinary games. We learned from AlphaZero that this makes the bot very, very good at ordinary games.\n\nSo what if self-play consisted instead of exploration of this specific problem? Would the bot then master this problem?\n\nIt seemed possible or even likely, but it seems nobody had tried it yet. So I contacted Thomas Redecker, one of the authors of a website containing an excellent modern analysis of the problem and who had earlier publicized some of the ways that modern bots uniformly all fail, and began training KataGo specifically on it.\n\nTraining Summary\n\n  In total, KataGo trained for 1 week on this problem on the same 28 GPUs as before, beginning from the strongest neural net from the end of its second run, around the full strength of ELF OpenGo.\n  Training used much the same self-play process as the earlier June run did, including all the same learning enhancements[3] and the full learning rate used during most of the June run. The primary difference was that rather than only starting games from the initial empty board, games were also started from Igo Hatsuyoron 120.\n  About 70% of self-play games started from positions in Igo Hatsuyoron 120. 30% were kept as as ordinary games to regularize and to keep KataGo effective at ordinary games.[4]\n  Among the 70%, we started about 1/6 from the initial position of Igo Hatsuyoron 120, and started about 5/6 from random positions from subvariations from prior human solutions and some of the bot’s own preferred lines in the first few days of training.\n    \n      Starting from subvariations was because we specifically wanted KataGo to be able to analyze numerous prior human solution attempts, and also were not sure initially whether a bot could learn the problem well at all (large-scale-fight-perception and unusual-shape-blind-spots being precisely two of the weaknesses of modern bots).\n      Random starting positions were chosen with probability exponentially decreasing in turn number such that each turn number was roughly equally likely. This is because uniform weighting would overwhelmingly weight positions at the very end of the game with almost nothing left to play, since the number of variations branches exponentially.\n      KataGo was not informed of the human-believed best move or evaluation in any position - positions were purely used only as the initial board state for self-play games.\n    \n  \n  Since the goal of the problem is to find the score-optimal result, not merely to win, we variably set komi in any starting position to the value that KataGo believed at that time would be fair, making score-precise play also necessary for winning/losing. (Plus mild additional normally-distributed noise, sigma = 1 point, for regularization).\n  We approximately cut in half the lookback window from the end of the June run for sampling training data for gradient updates, to reduce the amount of time for new Igo Hatsuyoron 120 data to fill the window and to reduce the overall latency of the learning loop for adapting to the new data.\n\n\nOverall, the training process was somewhat informal and exploratory, being tacked on to the end of an existing strong regular run and borrowing many of its hyperparameters without further consideration. Repeating it with more careful setting of hyperparameters and with additional runs would be useful to get a better controlled comparison of methods - our goal here was just to see how well this would work at all.\n\nResults\n\nThe progress of training was fun to watch. From limited anecdotal observation of KataGo’s preferred lines during training, KataGo was constantly re-discovering and changing its mind between many known human variations at different points. It seemed much of the week of training was “necessary”, with KataGo re-discovering the “guzumi” move - one of the strangest moves found earlier by humans and believed to be part of the main line - only in the last few days.\n\n\n\n\nThe surprising guzumi move.\n\n\n\n\nAs far as we can tell, KataGo now displays an overall strong understanding of the elements of this problem and displays a sensitivity to even single-liberty differences in the major fights. It confirms most of the surprising tactics found earlier by humans, and also finds what appears to be a small number of apparent mistakes and/or simpler refutations in some side variations of human analyses.\n\nAnd excitingly, it also suggests a few new moves along the main line! Whereas the tentative best known human line led to a win by 3 points for Black, as a result of these new moves, KataGo seems to believe that White will win by 2 points, or at least by 1 point.[5]\n\n\n\n\nKataGo's three new variations.\n\n\n\n\nDo these moves work and is this solution correct? Some human study of these moves has already been started and seems to support their soundness, but it is not entirely clear yet. And of course, given the vast depth of the problem, it’s hard to rule out further training or future research uncovering other moves elsewhere. KataGo’s own understanding also does not appear to be perfect. In a few lines, it gives overconfident or underconfident estimates with large error relative to deeper evaluations from playing out the line, and in some lines it can require significant numbers of playouts (at least tens of thousands) before it reliably settles on what seem to be the right moves. However, at a minimum KataGo is now at the point of being a powerful analysis aid for this problem and is the first bot to have a good understanding of it.\n\nKataGo’s tentative main line can be viewed here:\n\n WGo.js player did not initialize.\n(download sgf)\n\nFuture Work\n\nOur training run here was fairly informal and exploratory, and leaves many open questions still:\n\n\n  \n    How reliably would the results replicate in repeated and more carefully-controlled training runs?\n  \n  \n    How well would a bot do if trained only from the starting position of Igo Hatsuyoron 120 - would it explore enough to avoid getting stuck, and how much slower would it converge?\n  \n  \n    Is there a cheaper way to make a bot good at “constructed” problems than to run full-scale self-play-training on the problems individually?\n  \n  \n    Due to problem-specific-training, KataGo seems to have dodged large-scale-fight weaknesses and unusual-shape weaknesses common to Zero-trained bots for specific problem, but is there a more general way to remedy those issues?\n  \n  \n    And of course, for Go players - are there still more surprises or new moves left to be discovered in this beautifully well-balanced and deep problem?\n  \n\n\nThe final trained neural net is available here for anyone to use with KataGo to explore and analyze on their own.[6]\n\nFor anyone else who might also be interested in trying an independent replication or otherwise to experiment with such methods, we hope these are some fun and interesting questions that can be explored in the future!\n\nAcknowledgements\nMany thanks to Michael Redmond (professional 9 dan) for consultation and preliminary analysis of KataGo’s variations, and to Thomas Redecker, Harry Fearnley, and Joachim Meinhardt for collaboration and consultation on this project.\n\nEndnotes\n[1]: When well behind and every move seems almost 100% losing, without a secondary signal such as score, AlphaZero-style bots lose any idea of how to play and begin to behave extremely poorly.\n\n[2]: We suspect large-scale-fight perception and blind-spot weaknesses in Go are related to more general issues in modern deep reinforcement learning regarding the robustness of learned models and insufficient exploration of certain strategies. Although Go bots are still superhuman on average, such failings seem to be echoes of the same kinds of issues encountered by AlphaStar and Open AI Five that prevented them from achieving clearly superhuman levels.\n\n[3]: One significant additional modification not part of the June run was used as well, which was to overweight the frequency of positions with high policy KL-divergence (essentially, high policy loss) in the training data. Some time after the June run but before this project, we’d separately found this to be a mild improvement for training, and hope to incorporate it more generally in a third major run in the next few months.\n\n[4]: Actually, the ratio used was 50-50 initially, but it was raised shortly after starting based on some intuition that more focused training would be good and from noticing that Igo Hatsuyoron games would be underweighted as a proportion of the data at 50-50 since they last for many fewer moves than normal games due to the board already being partly filled up.\n\n[5]: There is some uncertainty about the final difference due to the fact that like all “AlphaZero”-based bots to-date, KataGo does not yet natively support Japanese rules, due to difficulties in formalizing that ruleset for self-play (and Igo Hatsuyoron is a Japanese-composed problem). Due to a parity difference in how the score is defined, the Japanese rules disagree with more computer-friendly “area” or “Chinese” rulesets by 1 point a nontrivial fraction of the time.\n\n[6]: For anyone who does find anything, a good venue for discussion might be the Life in 19x19 forums such as in this thread here - note: these forums are in NO way affiliated with Jane Street, they are external discussion forums about Go.\n",
        "url"      : "https://blog.janestreet.com/deep-learning-the-hardest-go-problem-in-the-world/",
        "image"    : "https://blog.janestreet.com/deep-learning-the-hardest-go-problem-in-the-world/goproblem.png",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Commas in big numbers everywhere: An OpenType adventure",
        "date"     : "October 14, 2019",
        "authorId" : "thume",
        "author"   : "Tristan Hume",
        "tags"     : [],
        "minsToRead" : 13,
        "content"  : "My job involves a lot of staring at large numbers, mostly latencies in\nnanoseconds, and picking out magnitudes like microseconds. I noticed\nmyself constantly counting digits in my text editor, in my terminal,\nand in Jupyter notebooks in my browser.\n\nI could have made specific solutions for each, but I thought “How\ncould I solve this in the most general way possible?” and an idea came\nto me: I could make a font that uses fancy font shaping features to\ninsert commas in all numbers, everywhere.\n\nIt was a fun idea, so I decided to play with it on my own time. After\nan evening reading documentation and a Sunday spent tinkering, I had it\nworking! I’ve been using the resulting font at work for a few weeks now\nand it’s been a really pleasant improvement.\n\nIn this post I’ll describe the basics of OpenType shaping, how I used\nit to make a font that emphasizes digit grouping, and how I extended it\nwith even fancier variations.\n\nI call it “Numderline.” Below you can see a preview of the main\nvariant. For more samples or to download the font, see\nhere, or read on.\n\n\n\n\n The flagship underlining variant of the Numderline font. Notice how\n digits are grouped visually.\n\n\nLearning how font shaping works\n\nI knew that font shaping was\nextremely\npowerful\nand driven by “tables” inside fonts, but I had no idea what the tables\nwere like and how the process of shaping worked. So I found the\nOpenType\nspecification\nand learned that the word “table” is used in a very loose sense just to\nrefer to some defined binary data structure.\n\n\nFont shaping is the process of mapping a string of unicode text (using\nthe “cmap” table) to a sequence of “glyphs.” Those glyphs have various\nsubstitutions applied to them, for example turning the sequence “ட,ு”\ninto the “டு” glyph, the substitutions coming from a multi-level\nhierarchy in the “GSUB” table. Then, the positions of the glyphs are\nadjusted using information in the “GPOS” table, for example to place\naccent glyphs in the correct location. The final result of font shaping\nis a sequence of positioned glyph IDs, which is rendered on your screen\nusing the information in the “glyf” table on the shape of each glyph\n(usually caching a rendered version of each glyph in an “atlas” for\nefficiency).\n\nGSUB seemed like the table I wanted, but as I read through its various\nsubstitution types, I became worried my plan wouldn’t work. They all\nseemed to work forward through the text, whereas I needed to count\ndigits backwards from the end of a number! Luckily at the very end\nof the list I found the feature I needed, intended for shaping Arabic\ncalligraphy: “Reverse Chaining Contextual Single\nSubstitution.”\n\nSubstitution rules work a bit like a limited form of regular\nexpressions. You provide “classes” of glyphs, which are basically just\nlists of glyph IDs. Reverse chaining substitution matches zero or more\nbacktracking classes: a single class for the glyph to be replaced,\nthen zero or more lookahead classes. If it matches, it substitutes\nusing a mapping table, which provides a replacement glyph for every\nglyph in the matching class. If multiple reverse chaining\nsubstitutions are provided, they can all chain with each other.\n\nAt first I thought I might need to build these binary tables by hand,\nbut after some more research I discovered that font designers often\nuse a language defined by Adobe called “feature\nfiles,”\nwhich compiles down to OpenType tables.\n\nHere’s an example of what a feature file looks like. It makes strings\nof vowels alternate in capitalization, starting from what they were on\nthe right:\n\n\n# Tell OpenType to use a system with fancy shaping features for latin characters\nlanguagesystem DFLT dflt;\nlanguagesystem latn dflt;\n# Classes can be given names and you can reference glyphs by name\n@lowercase=[a o e u i];\n@uppercase=[A O E U I];\n\n# Substitutions are put under \"features\", which have different capabilities.\n# The one I use is \"contextual alternates\", which is enabled by default\n# and allows contextual substitutions.\nfeature calt {\n    # Matches a lowercase followed by a lowercase and replaces it with\n    # the corresponding uppercase version, and vice versa.\n    # The ' signifies the glyph in the pattern to be substituted.\n    # This turns AoeUI into AoEuI\n    reversesub @lowercase' @lowercase by @uppercase;\n    reversesub @uppercase' @uppercase by @lowercase;\n} calt;\n\n\n\nUsing the knowledge\n\nI realized that instead of inserting commas, I wanted to underline\nalternating groups of 3 digits so that the font would work in monospace\ncontexts. This would require keeping track of each digit’s position\nfrom the right of the number modulo 6. Unfortunately the only way to\nkeep track of state is by replacing glyphs: notice how in the above\nexample we effectively count modulo 2 by matching on the case (state)\nof the glyph on the right and ensuring the alternating\nuppercase/lowercase pattern continues.\n\nSo I needed to make six copies of each digit glyph, corresponding to\nthe six possible states. For instance I would have six different “4”\nglyphs, each with a different glyph ID that I could match on\nseparately, with the first three glyphs looking normal and the last\nthree having an underline. My feature file would have six class\ndefinitions for each set of the digits 0-9, and my substitutions would\nsubstitute one class for another based on the class it matched from\nthe right.\n\nI looked up how the Powerline font\npatcher worked and read the\nFontForge API\ndocumentation. I modified the patcher to make\nmultiple copies of each digit glyph and to use string templating to\ngenerate a feature file with class definitions that reference the\ndifferent sets of copies followed by substitutions that use them.\n\nI then needed a way to test the resulting font, but I didn’t want to\nhave to reinstall the font constantly and worry about stale caches. I\nended up using a test web page which loaded the font as a web font.\n\nI implemented the counting modulo six using “reversesub” rules, but\nthe FontForge API segfaulted when I tried to compile the feature file!\nIt turns out there’s been an open issue about this for years but\n“reversesub” rules are so infrequently used that it hasn’t been\nfixed. So I switched to using\nfontTools to add in my\nfeature files, and it worked!\n\n\n\nNow I just needed to modify some of the copies of the digits to display\nthe digit grouping. In order to add the underline I found the\nunderscore glyph and pasted it on top of the digit copies corresponding\nto 3 to 5 mod 6 from the right. When used on a font with a sufficiently\nwide, low, and thin underscore like DejaVu Sans Mono, these alternating\ngroups of 3 underlined digits looked reasonably pleasant.\n\nImprovements\n\nI had accomplished my basic goal, but I still had other ideas. I\nmodified the feature file code so that it wouldn’t touch numbers less\nthan 4 digits, and it numbered digits after the decimal place left to\nright instead of right to left. I also added a debug mode that pasted\nthe index of the copy under each glyph in the copied digit set to\nvisualize how the font was working:\n\n\n\nNote that there’s a “6” copy that’s different from the “0” copy. This\nwas so that I could implement my original goal of inserting commas by\npasting the comma next to the correct digit, but without having a\ncomma to the right of the number, or to the left of 3 digit numbers. I\nalso made grouping digits after the decimal point optional so that\ninserting commas didn’t look as weird.\n\n\n\nThe underlying text above has no commas and that selection is one\ncharacter! A friend suggested I try squishing groups of 3 digits\ntogether, so I tried that as well, by scaling and translating\ndifferent copies in different ways:\n\n\n\nAnd then I combined it with the commas to make a version that can\ninsert commas even in monspace contexts:\n\n\n\nAfter all these improvements my final feature files looked like this:\n\nlanguagesystem DFLT dflt;\nlanguagesystem latn dflt;\n@digits=[zero one two three four five six seven eight nine];\n# class definitions for all my copies of the digit glyphs\n@nd0=[nd0.0 nd0.1 nd0.2 nd0.3 nd0.4 nd0.5 nd0.6 nd0.7 nd0.8 nd0.9];\n# [clipped] lines for @nd1 through @nd5 ...\n@nd6=[nd6.0 nd6.1 nd6.2 nd6.3 nd6.4 nd6.5 nd6.6 nd6.7 nd6.8 nd6.9];\n\nfeature calt {\n    # Number glyphs after a period left to right by forward chaining off an @nd2\n    sub period @digits' by @nd2;\n    sub @nd2 @digits' by @nd1;\n    sub @nd1 @digits' by @nd6;\n    # [clipped] lines for @nd5 through @nd3 ...\n    sub @nd3 @digits' by @nd2;\n\n    # Only convert numbers with at least 4 digits\n    sub @digits' @digits @digits @digits by @nd0;\n    # Chain to mark rightmost digit as @nd0\n    sub @nd0 @digits' by @nd0;\n\n    # Chain in reverse from the rightmost @nd0\n    reversesub @nd0' @nd0 by @nd1;\n    reversesub @nd0' @nd1 by @nd2;\n    # [clipped] lines for @nd3 through @nd5 ...\n    reversesub @nd0' @nd5 by @nd6;\n    reversesub @nd0' @nd6 by @nd1;\n} calt;\n\n\n\nUsing it at work\n\nAt work I used the StyleBot Chrome\nextension\nto inject custom CSS in my Jupyter notebooks,\nwhich made my Pandas tables right-aligned,\nand used my font, so that I could now more easily parse the columns of\nlarge numbers. I also switched to the Kitty\nterminal, one of the few\nthat supports font shaping, and set it up with my font so that I could\nuse it with Jane Street’s command line data retrieval\ntools.\n\nUnfortunately I couldn’t use it in my text editor since Emacs doesn’t\nsupport font shaping, and while Sublime Text 3 does, it has an\noptimization where it doesn’t apply font shaping to alphanumeric\ncharacters to save space in the shaping cache.\n\nI’ve been really enjoying it for the couple weeks I’ve used it – it\nmakes visually parsing tables easier, and in my Jupyter notebooks,\neven though Python 3 now supports underscores in numbers, I don’t need\nto manually add them and update them when I change a number\nanymore. My font makes them obsolete.\n\nI’ve also gotten lots of interest in using the font from my coworkers,\nbecause it turns out many people at Jane Street spend time staring at\nvarious types of large numbers.\n\n\n\n\nA custom stylesheet makes Jupyter notebooks use Numderline and right-align Pandas tables\n\n\nConclusion\n\nYou can preview the various versions of the font at\nhttps://thume.ca/numderline/ and download pre-patched versions of\nsome selected fonts, or look at the\nsource and use the patcher\nyourself.\n\nOn the one hand, this whole project seems like a hack which uses font\nshaping for something it’s not intended to do, but on the other hand I\nthink that font shaping is the natural place to apply stylistic features\nthat make text easier to read.\n\nOver the years, as font shaping has rolled out more advanced features\nand made more languages display better in more places, I think it was\na failure of imagination to not improve the display of English text\nand numbers as well. I think digit grouping should always have been\nthe job of font shaping. Indeed, it is the programming languages and\nstyle guides suggesting you insert commas and underscores every three\ndigits that are the hacky workarounds!\n\nI think this is a great example of what can be accomplished by knowing\nall the parts of a system so that you can find the best place to\nimplement something. Notably, I didn’t need to understand how OpenType\nshaping worked to come up with the idea. I just had to know it existed\nand have an idea of what it was capable of, so that I knew to go\nresearch it further.\n\nMy work is in performance engineering, where understanding the basics\nof how the entire system works end to end, then diving deep and\nlearning about some overlooked area, is a great way of finding\npotential optimizations. Nearly every day I spend time learning about\na new part of a system, or profit from past learning by coming up with\nan idea based on some thing I vaguely remember reading years ago. I\nthink it’s really valuable to constantly learn about interesting\nthings just outside your usual domain so that you can realize when\nthey might be applicable after all.\n",
        "url"      : "https://blog.janestreet.com/commas-in-big-numbers-everywhere/",
        "image"    : "https://blog.janestreet.com/commas-in-big-numbers-everywhere/numderline_header2.png",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "What the interns have wrought, 2019 edition",
        "date"     : "August 30, 2019",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : ["internship"],
        "minsToRead" : 14,
        "content"  : "Jane Street’s intern program yet again is coming to an end, which is a\nnice opportunity to look back over the summer and see what they’ve\naccomplished.\n\nBetween our 54 interns in New York, London, and Hong Kong (who came\nfrom 32 schools in 13 countries!), there’s far too many projects to\nrecount here.  So, instead of doing an overview, I’ll just a pick a\nfew different projects to highlight the breadth of work that was done.\n\nThe first project I’ll discuss is latviz, a plotting tool for\nvisualizing latencies; the second is vcaml, a layer for extending\nNeovim in OCaml.  And the third is about implementing an LDAP library.\n\nVisualizing latencies\n\nLots of what happens in a trading infrastructure is the shipping\naround of data, really, a disturbing amount of data. Much of that data\nis marketdata from various exchanges, and a lot of it is data that’s\ngenerated internally, often in response to marketdata updates.\n\nFor these systems to work well, they need to distribute messages with\nlow and deterministic latency. As a result, we spend a lot of time and\neffort trying to measure, analyze, and understand our latencies.  We\ntend to do this by leveraging specialized hardware devices to capture\nand timestamp network traffic, and then do a bunch of post-hoc\nanalysis to actually see what the latencies are.\n\nAnd then, after we’ve done all that work, we tend to visualize this\ndata by generating ascii-formatted tables containing summaries of the\nlatencies.\n\nSigh.\n\nWhile I love old-school, text-based UIs, they’re sometimes just a poor\nfit.  Well-designed, interactive visualizations can do a lot to make\nit easier for people to quickly and easily understand what the data is\ntrying to tell them.\n\nLukasz Miskiewicz’s project (mentored by Maciej Debski) was to create\njust such a tool for visualizing latency data, called latviz.\nBecause latency problems are often related to issues that are\nlocalized in time, like data rate spikes, or network configuration\nissues, it’s often useful to be able to see how latencies vary over\nthe day.  One of the goals of latviz was to make the time-dependent\nnature of latency data easy to see, unlike our ordinary ways of\npresenting the data.\n\nHere’s what the final result looks like.\n\n\n\n\nLatviz organizes data temporally, combining a histogram showing the\ndistribution of latencies at each time, and a plot showing the median\nlatency.\n\n\nThe project had a few different parts.\n\n\n  \n    A high-level and type-safe wrapper over\nHighcharts, a widely used JavaScript\ngraphing library.  We write our web applications in OCaml, which\nallowed Lukasz to use some more advanced features of the OCaml type\nsystem (specifically,\nGADTs)\nto make the API less error-prone.\n  \n  \n    The latviz webclient, based on the aforementioned Highcharts library\nand the Incr_dom\nframework.\n  \n  \n    A back-end which allows us to efficiently compute the statistics\nrequired to generate the visualization.  This was based on a\ncommercial column-oriented database, and required some cleverness\naround the construction of the tables and the queries to make it all\nrun acceptably fast.\n  \n  \n    A tool for loading latency data into the system.  One of the\ndifficulties here is that our latency data is based on network\ncaptures, which means there’s a ton of data to rip through.\nAccordingly, Lukasz built a tool that could efficiently down-sample\nthe data, using reservoir\nsampling, and\nsend the resulting data to the latviz server.\n  \n\n\nAll told, we’re excited about the results.  It makes it easier to see\nwhat’s going on, and the tool is efficient enough that you can tweak\nthe way in which the data is put together (for example, changing the\npercentiles that are plotted, or the size and distribution of buckets\nin the historgram), and get back an updated visualization almost\nimmediately.\n\nRight now the tool has only been applied to a single system, but it’s\nalready helped us find one real performance regression, and we hope\nthat over the coming months we hope to make latviz available more\nbroadly.\n\nAutomating Vim\n\nWe care a lot about dev-tools.  Developer time is precious, and great\ntools make developers more efficient, not to mention happier.\n\nAnd there are lots of different pieces of infrastructure we work on to\nimprove developers’ lives: the OCaml compiler, documentation\ngenerators, continuous-integration systems, code review tools, testing\nframeworks, and so on.  But if I had to pick one tool that is most\nintertwined with developers’ daily lives and about which they have the\nstrongest feelings, I think it would have to be their editor.\n\nAccordingly, we’ve spent a lot of time over the years improving the\neditor experience here, and doing a lot of work to integrate our\nvarious dev-tools seamlessly into the editor.  This was highlighted by\nthis\npost\non Feature Explorer, which is our integration of Iron, our code review\nsystem, into Emacs.\n\nAnd, there’s the rub: most of this integration work has been done in\nEmacs.  Emacs is the editor that we recommend to new hires, and I once\nhoped that we’d eventually make the Emacs experience good enough that\npeople wouldn’t clamor for us to support anything else.\n\nThat was a vain hope.  Vim users are a\nstubborn and resourceful bunch, and we’ve always had a solid minority\nof developers who use Vim, mostly figuring out their own\ncustomizations and integrations to make it nicer to use Vim in our\nenvironment.\n\nRecently, our tools group has decided to start taking on more\nresponsibility for Vim tooling.  And the first step there is to make\nit easier to create tooling for Vim.  Currently, if you want to\ncustomize Vim, you need to write vimscript, a programming language\nthat I’ve never heard anyone speak kindly about.\n\nOn the Emacs side, we dealt with a similar problem, in that elisp,\nwhile not nearly as bad of a language as vimscript, is not a great\ntool for writing and maintaining complex extensions.  That’s why we\ncreated ecaml, a set of\nlibraries that target Emacs’ FFI so you can write your extensions\nentirely in OCaml. (Amusingly, the first version of ecaml was also an\nintern project, done by Aaron Zeng, now a full-timer!)  Today,\nessentially all of our Emacs extensions are written in OCaml.\n\nThis summer, intern Cameron Wong (mentored by Ty Overby) worked on\nadding the equivalent functionality for Vim (or really, Neovim), in\nthe form of a library called vcaml.\n\nThe reason they chose Neovim rather than Vim is that the former has an\nasynchronous RPC based mechanism (built on\nMessagePack) that lets you control\nthe editor at a fairly low level.  We wanted to write high-level OCaml\nbindings to this, so the first step was to dump all of the APIs and\ntheir types.\n\nFrom this, Cameron generated low-level (and somewhat type-safe)\nbindings for all the APIs, after first using the\nAngstrom and\nFaraday libraries for\nbuilding parsers and printers for MessagePack.  The implementations of\nthese APIs wind up serializing all of the arguments into a MessagePack\nlist, and getting back a MessagePack object to deserialize.\n\nThere is a fly in this ointment, though.  Because this is an RPC-based\nmechanism, you can end up in various race conditions when a sequence\nof operations that you want to happen atomically get interrupted.\nHappily, Neovim has a workaround for this; a call_atomic function,\nwhich takes the arguments for multiple functions and calls them all\nin sequence without allowing anything else to interrupt.\n\nThis leaves you, unfortunately with a somewhat awkward API design\nquestion, in that it’s not totally clear how to build an easy-to-use,\nstrongly typed interface that lets you combine together multiple RPC\ncalls into one, while still giving easy access to the results of the\nconstituent calls.\n\nThe solution, it turns out, was to turn this into an\nApplicative.\nThe applicative interface, along with the\nppx_let syntax extension,\nprovides an easy-to-use and type-safe interface for constructing such\ncalls.\n\nFor a somewhat artificial example, let’s say we want to build an API\ncall that atomically gets out the first and last lines out of a\nbuffer.  The Buf.get_lines call will let us query lines from\nanywhere in the buffer, and we can the applicative let-syntax for\njoining those together into a single call that atomically grabs both\nvalues.\n\nlet get_first_and_last buffer =\n  let open Api_call.Let_syntax in\n  let%map first = Buf.get_lines ~buffer ~start:0 ~end_:1 ~strict_indexing:true\n  and last = Buf.get_lines ~buffer ~start:(-1) ~end_:0 ~strict_indexing:true in\n  first, last\n;;\n\n\n\nIt’s still early days, but we’re excited about the potential this has\nto improve our tooling around Vim.  And our plan is to release all of\nthis as open source in the coming months.\n\nReimplementing LDAP\n\nIf you just know about Jane Street from our tech blog, you might think\nwe do everything on Linux, but that’s far from the case.  While most\nof our home-grown software runs on Linux, we have a big Windows\ninfrastructure, mostly oriented towards supporting our Windows desktop\nmachines.\n\nPart of supporting that infrastructure is interfacing with Active\nDirectory,\nand in particular, having the ability to communicate to the DCs\n(Domain Controllers); and ideally, we want to be able to do so from\nour ordinary OCaml codebase.\n\nAnd indeed, we’ve long had a solution for doing so. DCs speak an open\nprotocol called LDAP (short for Lightweight Directory Access\nProtocol), and for the last 8 years, we’ve used\nOcamldap, an open-source\nOCaml library, for speaking LDAP to the DCs.\n\nOcamldap has been enormously useful, but there are some problems with\nit as well.\n\n\n  It makes heavy use of Unix alarms instead of using Async, which\nmakes it hard to integrate into our programs, often leaving us with\nweird crashes and segfaults.\n  It implements a bunch of basic services on its own, like SSL, in its\nown custom way.  That was necessary when Ocamldap was written, but\ntoday, there are a bunch of new libraries for doing this kind of stuff\nthat works better than the custom implementations in Ocamldap.\n  The basic design is exception-heavy, which leads to surprising and\nsometimes uninformative messages when things go wrong.\n  It’s not actively developed.  It’s been responsibly maintained over\nthe years, but there’s not a lot of serious new development at this\npoint.\n\n\nThe project we gave to intern Xueyuan Zhao (mentored by Nolen Royalty)\nwas to build a new LDAP implementation to replace ocamldap.\n\nThere were a bunch of parts that went into the project.  LDAP makes\nuse of ASN.1 and is encoded using\nBER.\nHappily, the spec\nis reasonably straightforward.  The mirage folks\nhave a great asn1 library\nalready, so the first step was creating types to represent the\nprimitives described in the RFC using their combinators.  One tricky\nthing here is that search filters are recursive.\n\nIn OCaml, you could write a recursive type for a simple filter\nlanguage like this:\n\nmodule Search_filter\n  type t =\n    | And of t list\n    | Or of t list\n    | Equals of string\nend\n\n\n\nIn the ASN.1 library, recursion is a little less straightforward.  In\nparticular, this requires use of the fixpoint combinator to allow us\nto construct a parser that refers to itself, as shown below.\n\n  let asn =\n    fix\n    @@ fun asn -&gt;\n    choice10\n      (* and *)\n      (implicit 0 (set_of asn))\n      (* or *)\n      (implicit 1 (set_of asn))\n      (* not *)\n      (explicit 2 asn)\n      (* equaliltyMatch *)\n      (implicit 3 Attribute_value_assertion.asn)\n      (* substrings *)\n      (implicit 4 Substring_filter.asn)\n      (* greaterOrEqual *)\n      (implicit 5 Attribute_value_assertion.asn)\n      (* lessOrEqual *)\n      (implicit 6 Attribute_value_assertion.asn)\n      (* present *)\n      (implicit 7 Attribute_description.asn)\n      (* approxMatch *)\n      (implicit 8 Attribute_value_assertion.asn)\n      (* extensibleMatch *)\n      (implicit 9 Matching_rule_assertion.asn)\n  ;;\n\n\n\nOnce we came up with a way to encode and decode LDAP we needed to\nconvince ourselves that we were doing the right thing.  We tested our\nnew encodings by writing expect\ntests that\nencoded the same queries using ocamldap and our new jane-ldap library,\nand diffed the results.\n\nIn some cases the encoding that we ended up with was\ndifferent. Debugging that was really fun: we’d end up with a test that\nlooked like this:\n\n  let%expect_test _ =\n    let ours = Ldap.Protocol.encode_message msg in\n    let theirs = Ocamldap.Ldap_protocol.encode_ldapmessage msg in\n    print_endline (hex_of_string ours);\n    print_endline (hex_of_string theirs);\n    [%expect\n    {|\n    30 16 02 01 01 61 11 0a 01 00 04 00 04 00 a3 06 04 01 41 04 01 42 87 00\n    30 16 02 01 01 61 11 0a 01 00 04 00 04 00 a3 06 04 01 41 04 01 42 04 00 |}]\n\n\n\nAnd then reverse engineer what the second-to-last byte of the message\ncontained, what we were encoding, and what ocamldap was encoding.  In\nsome cases we found bugs in our code, but in some cases we found bugs\nin ocamldap.\n\nThe final bit of the project, which isn’t quite complete, is to wrap\nup our now-functional parser in a nice client library that doesn’t\nrequire users to read the whole LDAP RFC.  While some Jane Street\nlibraries like async_ssl\nget you some basic plumbing for free, one of our goals was to produce\na native Async library (instead of a library that was only async via\nIn_thread.run and some hope), which means that the library needs to\nhave the ability to track, encode, and decode many different\nqueries/responses concurrently.  The LDAP protocol provides the basic\ntooling to do this - each query is uniquely identified by a ‘message\nid’ - but it adds some extra complexity and state.\n\nAll told, we’re excited about the prospects of this work, and hope to\nget the results open sourced.\n\nYou could be an intern too!\n\nI hope the above project descriptions have given you a sense of the\nscope and diversity of projects interns get to work on.  Jane Street\ninterns get to do real, and really interesting, development work.\n\nSo, if this sounds like fun, you should\napply! Jane\nStreet internships are a great learning experience, and a lot of fun.\nAnd if you’re interested in learning more about the application\nprocess, this\nis a pretty good place to start.\n\n(Links to\nReddit\nand HN, if you want\nto comment on the post.)\n",
        "url"      : "https://blog.janestreet.com/what-the-interns-have-wrought-2019/",
        "image"    : "https://blog.janestreet.com/what-the-interns-have-wrought-2019/what_interns_wrought2019.jpg",
        "topic"    :  ["technology","internship"] ,
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Using OCaml to drive a Raspberry Pi robot car",
        "date"     : "August 19, 2019",
        "authorId" : "lmazare",
        "author"   : "Laurent Mazare",
        "tags"     : [],
        "minsToRead" : 14,
        "content"  : "Back when the Raspberry Pi was first released in 2012 Michael Bacarella wrote\na blog post\non using OCaml and Async on this little device.\nSince then installing OCaml via opam has become a pretty smooth experience\nand everything works out of the box when using Raspbian – the default Raspberry Pi\ndistribution.\n\nRaspberry Pis are commonly used to teach programming as students get an immediate\nfeeling of how their projects work in the ‘real world’. There are many resources\nfor learning Python on the Raspberry Pi, but we were wondering\nif we could also use this single-board computer to help students learn OCaml\nand the common paradigms that we use here at Jane Street (such as Async\nfor concurrent programming).\nA fun way to do this is to use a Raspberry Pi to drive a small autonomous\nrobot car using OCaml to write the driving logic.\nWe tried this with two different robot car kits:\n\n  The Freenove Three-wheeled Smart Car Kit (F3W),\na three wheeled robot.\n  The Adeept AWR 4WD Smart Robot Car Kit (A4W),\na four wheeled robot.\n\n\nSeparately, we got a Raspberry Pi 3B+ as in both cases the robots came as a kit\nthat that did not include batteries or a Raspberry Pi and the 4B version was not\nyet available at this point. Both kits include a camera and an ultrasonic sensor\nto measure the distance to obstacles.\nSome Python code is provided to illustrate how to interact with the different sensors,\nrun the motors and servos, and use some additional features like LEDs or Buzzer modules.\nThis code can be found on GitHub in the Adeept repo\nand in the\nFreenove repo.\n\nThe main difference between the two kits is that for the F3W, the two front wheeled are motorized\nand a servo is used to change their direction. For the A4W all four wheels are motorized and their\ndirections cannot be changed so turning is done ‘in-place’ by rotating the wheels faster\non one side of the robot and slower (or backwards) on the other side.\n\nAfter a couple hours we were done with building the robots and could start playing with the\nPython examples. Everything worked smoothly and we moved to the next step: replacing Python\nwith OCaml. Installing the compiler and libraries such as Base or Async via opam\nwas a breeze but we still had to get the OCaml code to interface with the Raspberry Pi GPIO\n(General-Purpose Input/Output) and to write easy to use APIs for the two robots.\n\nF3W: Controlling the Robot via I2C/SMBus\n\nThe F3W robot we built can be seen in the image at the top of this post. It includes a\ncamera, some ultra-sonic sensors to measure the distance to the nearest obstacle,\na RGB led, and a buzzer. There are two motors connected to each front wheel,\na servo to control the robot direction, and another servo to control the\ndirection of the ultrasonic sensor and/or camera.\n\nThe provided Python examples use\nSMBus through the\nI2C/dev interface so we wrote some small OCaml bindings to be able to open I2C\ndevices and read/write data. The code for this can be found in the\nocaml-smbus repo on GitHub.\n\nThe Robot API\n\nThe I2C operations got abstracted in the Bus module with a simple\ninterface\nand on top of this we wrote our robot API. This API looks as follows:\n\ntype t\n\nval create : unit -&gt; t\nval set_rgb : t -&gt; [ `white | `red | `green | `blue | `off ] -&gt; unit\nval set_buzzer : t -&gt; level:int -&gt; unit\nval set_pwm : t -&gt; level:int -&gt; unit\nval get_sonic : t -&gt; float\n\n(** [set_servo1 t v] sets servo 1 to orientation [v] between 0 and 1. *)\nval set_servo1: t -&gt; float -&gt; unit\nval set_servo2: t -&gt; float -&gt; unit\nval set_servo3: t -&gt; float -&gt; unit\n\n\n\nThe create function can be used to connect to the relevant I2C device. With\nthe returned value, one can:\n\n  Set the color of the RGB led via set_rgb.\n  Produce some sound with set_buzzer.\n  Actuate both motors with set_pwm.\n  Get the distance to the closest obstacle (in centimeters) from the ultrasonic sensor using get_sonic.\n  Set the directions of the 3 different servos.\n\n\nHere is a simple program that makes the RGB LED blink in different colors.\n\nlet () =\n  Mdev.set_rgb mdev `red;\n  Unix.sleep 3;\n  Mdev.set_rgb mdev `green;\n  Unix.sleep 3;\n  Mdev.set_rgb mdev `blue;\n  Unix.sleep 3;\n  (* Turn the led off. *)\n  Mdev.set_rgb mdev `off;\n\n\n\nThe Sonic-Scan Module\n\nAs our ultrasonic sensor orientation can be changed via a servo, we thought that it\nwould be nice to have it run like a rotating radar: the angle of the sensor is\nconstantly adjusted so that we have recent reads for the obstacle distance in\nfront of us, as well as on the left and right sides. The difficulty is that we\nwould also like to run our driving algorithm concurrently, and adjust the robot\ndirection and speed based on recently measured distances. In order to achieve this\nbeing able to run multiple threads would be helpful. As the Raspberry Pi has\nmore horsepower than typical embedded systems we used\nAsync to handle concurrency.\nHere is a sketch of how we wrapped the ultrasonic sensor thread in\nits own module. The full code can be seen\nhere.\n\nmodule Sonic_scan : sig\n  type t\n  val create : Mdev.t -&gt; angles:float list -&gt; refresh_span:Time_ns.Span.t -&gt; t\n  val distances : t -&gt; float list\nend = struct\n  type t = { mdev : Mdev.t; distances : float array; angles : float list }\n\n  let refresh t =\n    Deferred.List.iteri t.angles ~f:(fun i angle -&gt;\n        Mdev.set_servo2 t.mdev angle;\n        let%map () = after (Time.Span.of_sec 0.1) in\n        t.distances.(i) &lt;- Mdev.get_sonic t.mdev)\n\n  let create mdev ~angles ~refresh_span =\n    let distances = Array.init (List.length angles) ~f:Float.nan in\n    let t = { mdev; distances; angles } in\n    Clock_ns.every' refresh_span (fun () -&gt; refresh t);\n    t\n\n  let distances t = Array.to_list t.distances\nend\n\n\n\nThe Sonic_scan.create function is used to create and start the module. It is given\nthe list of angles at which to monitor the distance as well as some target refresh\nrate. The Sonic_scan.distances function returns the last measured distances.\nMost of the Async interaction is abstracted so that the module user does not have\nto know about the implementation details.\n\nThe driving logic can then use this module to adjust the direction and speed of\nthe car over time:\n\n  If the front distance is less than 20 cms we stop the car.\n  If it is more than 60 cms we drive straight ahead.\n  Otherwise, if the distance measured on the left is shorter than the one\non the right we adjust the direction to turn right and conversely if the\nright distance is shorter than the left the direction is adjusted to turn\nleft.\n\n\n  let right_dist, center_dist, left_dist =\n    match Sonic_scan.distances sonic_scan with\n    | [ x; y; z ] -&gt; x, y, z\n    | _ -&gt; failwith \"unexpected distance list length\"\n  in\n  let pwm_level, angle =\n    if Float.( &lt; ) center_dist 20. then 0, 0.5\n    else if Float.( &gt; ) center_dist 60. then 400, 0.5\n    else if Float.( &lt; ) right_dist left_dist then 300, 0.3\n    else 300, 0.7\n  in\n  Mdev.set_pwm mdev ~level;\n  Mdev.set_servo1 mdev angle\n\n\n\nUsing this very naive logic we got our car to run through a corridor and avoid\nthe walls (most of the time).\nThis logic can certainly be improved upon to be made more robust, e.g.\nusing a PID controller.\n\nA4W: Using the Raspberry Pi GPIO directly\n\nThe A4W robot is smaller than the F3W. It is also easier to control as the\ndirection is fixed. The four motors can be controlled independently and the\nultrasonic sensor also has a fixed orientation looking straight ahead.\n\n\n\nThe Python examples for the A4W use the PyPI RPi.GPIO package.\nThe underlying C API works by mem-mapping the /dev/gpiomem device and directly reading from or\nwriting to it.\nOnce again we wrote some OCaml wrappers around these low-level features, the related code\ncan be found in the\nocaml-rpi-gpio github repo.\n\nThe resulting API ends up being fairly simple and uses\nphantom types\nfor improved type safety.\ntype _ t\n\nval create_input : channel:int -&gt; mode:[ `bcm | `board ] -&gt; [ `input ] t\nval create_output : channel:int -&gt; mode:[ `bcm | `board ] -&gt; [ `output ] t\nval input : [ `input ] t -&gt; int\nval output : [ `output ] t -&gt; int -&gt; unit\nval close : _ t -&gt; unit\n\n\n\nThe robot API then consists of two modules. The Ultrasonic module handles measuring the\ndistance to the closest obstacle. Here the OCaml code measures the elapsed time\nbetween when sending the ultrasound waves and receiving it back and assumes a speed of\nsound of 340 meters per seconds to return a distance in meters.\nThe Motors module makes it easy to set the motors to run at some given speed, speed_left\nis for left-hand side motors and speed_right for right-hand side motors.\n\nmodule Ultrasonic : sig\n  type t\n  val create : unit -&gt; t\n  val get : t -&gt; float\nend\n\nmodule Motors : sig\n  type t\n\n  val create : unit -&gt; t\n  val move : t -&gt; speed_left:float -&gt; speed_right:float -&gt; unit\n  val reset : t -&gt; unit\nend\n\n\n\nThe full code for this can be found in\nawr.ml.\n\nA very simple example use of these modules consist in getting the robot to move forward\nas long as the closest obstacle is at least 60cm away. When an obstacle is detected\ncloser than this threshold, the robot turns right until there is no obstacle.\n\n  let motors = Motors.create () in\n  let ultra = Ultrasonic.create () in\n  while true do\n    Unix.sleepf 0.1;\n    let d = Ultrasonic.get ultra in\n    let speed_left, speed_right =\n      if Float.(d &lt; 0.1) then 0., 0.\n      else if Float.(d &lt; 0.6) then 70., -70.\n      else 100., 100.\n    in\n    Motors.move motors ~speed_left ~speed_right\n  done\n\n\n\nOnce again this navigation algorithm is very basic but already gets the robot running\naround and avoiding obstacles.\n\nFollow-Ups\n\nUsing OCaml on these Raspberry Pi robots was a very interesting experience,\nallowing us to learn about the low-level bits of how GPIO works as well as other kernel\nsubsystems. We also hope that this can be leveraged in new fun ways to learn\nOCaml.\n\nThere are several things that we would like to explore next:\n\n\n  We didn’t use the embedded cameras from our robot. Using some computer vision\nmodels would give the driving algorithm far more context on the outside world.\nWe have an OCaml implementation of the\nYOLOv3 object detection model\nhowever the Raspberry Pi is a bit under-powered to run such deep learning models\nso maybe would it be worth upgrading to a jetbot with its embedded GPU.\n  Lidar modules have become increasingly less expensive in the past few years. These modules yield far more robust\ndistance estimations compared to ultrasonic sensors. This would make for a nice addition to our robot cars.\n  We only implemented some very simple driving logic. It would be interesting to\nexperiment with more complex algorithms, for example trying to achieve loop-closing in\nsimultaneous localization and mapping.\n\n",
        "url"      : "https://blog.janestreet.com/using-ocaml-to-drive-a-raspberry-pi-robot-car/",
        "image"    : "https://blog.janestreet.com/using-ocaml-to-drive-a-raspberry-pi-robot-car/robot-pi.jpg",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Do applied programming languages research at Jane Street!",
        "date"     : "August 16, 2019",
        "authorId" : "lwhite",
        "author"   : "Leo White",
        "tags"     : [],
        "minsToRead" : 0,
        "content"  : "As our Tools & Compilers team has grown, the kinds of projects we work\non has become more ambitious. Here are some of the major things we’re\ncurrently working on:\n\n\n  Unboxed types\n  Feedback-directed optimization\n  Typed algebraic effects\n  Design of IRs for inlining and optimization\n\n\nAnd we’re considering future work on:\n\n\n  Modular implicits\n  Macros and staging\n  Rust-style ownership\n  Supporting inductive families in the module system\n\n\nAll of these involve interesting, research-level questions, and\nthey’re all things that we think could make for important, practical\nimprovements to the utility of OCaml, both for our uses, and more\ngenerally.\n\nWe’re really excited about making OCaml an ever more practical and\nbeautiful language. To help us tackle these more ambitious projects,\nwe are looking to hire people with a background in programming\nlanguage research and development.\n\nIf that sounds exciting to you, you can apply\nhere; look for\nthe “compiler engineer” roles!\n",
        "url"      : "https://blog.janestreet.com/applied-PL-research/",
        "image"    : "https://blog.janestreet.com/applied-PL-research/compiler3d.jpg",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "A look at OCaml 4.08",
        "date"     : "July 12, 2019",
        "authorId" : "lwhite",
        "author"   : "Leo White",
        "tags"     : [],
        "minsToRead" : 17,
        "content"  : "Now that OCaml 4.08 has been released, let’s have a look at what was\naccomplished, with a particular focus on how our plans for\n4.08 fared. I’ll mostly focus on work that we\nin the Jane Street Tools & Compilers team were involved with, but we are\njust some of the contributors to the OCaml compiler, and I’ll have a\nquick look at the end of the post at some of the other work that went\ninto 4.08.\n\nIn the end, 4.08 had a very late freeze, but here at Jane Street we\nmostly moved on to other things once the originally planned freeze date\nin October had passed. That meant planned features that weren’t ready by\nthen were mostly put to one side and left for a later version.\n\nEven for the things that were part of “our” plan, the work to make that\nplan happen came from both inside and outside of Jane Street. And not\nall of that work is writing code! In particular, we owe thanks to all\nthe people in the larger OCaml development team who gave us feedback on\nour proposals, reviewed our pull requests, and helped to get things\nmerged upstream.\n\nPlanned work\n\nSimple support for GADTs in or-patterns\n\nThis was merged upstream in\nPR#2110 following a series\nof prerequisite pull requests that changed technical details of how\npatterns are type-checked\n(PR#1745,\nPR#1748,\nPR#1909,\nPR#2317). In 4.08 it will\nbe possible to write:\n\ntype 'a ty = Int : int ty | Bool : bool ty | String : string ty\n\nlet is_string : type a. a ty -&gt; bool = function\n  | Int | Bool -&gt; false\n  | String -&gt; true\n\n\n\nIt still won’t be possible to write or-patterns whose cases rely on\nequations from the GADT, for example:\n\ntype 'a ty = Int1 : int ty | Int2 : int ty | String : string ty\n\nlet five : type a. a ty -&gt; a = function\n  | Int1 | Int2 -&gt; 5\n  | String -&gt; \"five\"\n\n\n\nis still rejected in 4.08. Special thanks to Jacques Garrigue for\nreviewing this work.\n\nShadowing of items from include\n\nThis was merged upstream as\nPR#1892. In 4.08 it will\nnow be possible to write:\n\ninclude Foo\nmodule Bar = struct ... end\n\n\nwhen Foo contains a Bar module, which is much easier than the old way:\n\ninclude (Foo : module type of struct include Foo end with module Bar := Foo.Bar)\nmodule Bar = struct ... end\n\n\n\nPrivate structure and signature items\n\nIn PR#2016 we proposed\nsupporting having items in signatures and structures that were not\nincluded in the resulting module or module type, using the syntax:\n\nprivate type t = int list\nprivate module L = List\n\n\n\nAn existing pull request,\nPR#1506 by Runhang Li and\nJeremy Yallop at OCaml Labs, had implemented an alternative feature that\ncan be used to handle similar cases. It added support for using\narbitrary module expressions in open statements. For instance, the above\nexample could be expressed as:\n\nopen struct\n  type t = int list\n  module L = List\nend\n\n\n\nThere was some debate about whether both these features should be\nsupported, or if only one should be included in the language. In the end\nit was decided that the open feature made more sense in structures, but\nthe private feature made more sense in signatures. It was also decided\nto give a different syntax to the private feature.\n\nBased on these discussions we wrote\nPR#2122, which\nre-implemented the private feature to work only in signatures and use\nthis syntax:\n\ntype t := int list\nmodule L := List\n\n\n\nreminiscent of destructive substitution. Then we wrote\nPR#2147, which\nre-implemented the open feature to restrict its use in signatures and to\nshare some implementation with\nPR#2122. These were both\nmerged and will appear in 4.08.\n\nImprove type propagation in lets\n\nWe had hoped to change the order of type-checking the components of let\nexpressions, so that:\n\nlet p : t = e\n\n\n\nwould be checked as t then e then p, rather than the current order\nof t then p then e. This matters for order-dependent features such\nas constructor disambiguation.\n\nUnfortunately, this turned out to be more work than we expected. The\nexisting code for typing let only has to deal with passing so-called\nprincipal type information – in this case information coming from type\nannotations rather than from inferring the types of expressions – to\nits patterns. The code for typing match knows how to do this, but its\nnot easy to extract the relevant parts so that they can be shared.\n\nWe decided to leave this change for now until we have time to make the\nmore invasive changes required.\n\nStrengthening the module system\n\nWe had hoped to add a notion of transparent ascription to the module\nlanguage, which is an operation:\n\nmodule M = (N &lt;: S)\n\n\n\nthat restricts M to the elements of the module type S, but it is\nstill known to be equal to N. This would allow us to keep more type\nand module equations in the module types produced in various parts of\nthe type system.\n\nUnfortunately, none of the user-visible parts of this work made it into\n4.08. Some large internal changes to support this work did get in:\n\n\n  PR#1610 which removes\npositions from paths was merged\n  PR#2127 which refactors\nhow the type environment handles looking up values but still needs\nsome work before being merged.\n\n\nThe main piece of work that underpins all the proposed changes is the\naddition of transparent ascriptions. This has proved fairly awkward to\nimplement in a satisfying way, which has prompted both the above pull\nrequests that try to tidy up some parts of the type system that were\nmaking the implementation awkward. The work is probably 80% done now, so\nit should be relatively easy to get it into a later version of OCaml\nonce we start working on it again.\n\nFlambda\n\nWork towards making flambda classic mode the default compilation mode\n\nWe tested and benchmarked -Oclassic mode on 4.07. We were happy with the\ncompile-time performance and correctness, but there were some situations\nwhere classic mode produced worse code than the non-flambda compiler. In\nparticular, the handling of recursive functions was not as good. We\ndecided that it would be better to wait for the work on improving the\ninlining heuristics for recursive functions, as that would allow us to\nfix this issue properly.\n\nSince work has also been progressing quickly on “Flambda 2.0”, which\nessentially has a completely new version of classic mode, we’re no\nlonger planning to push for having the existing classic mode as the\ndefault.\n\nImproved inlining heuristics for recursion\n\nLuke Maurer’s internship work on the inlining heuristics for recursion\nis still waiting on some improvements that are needed before it can be\nupstreamed. We’re now hoping to get that into OCaml 4.10.\n\nPierre Oechsel’s internship work on improving the stability of the\ninlining heuristics, and improving the support for displaying the\nresults of these heuristics to uses, is based on top of Luke Maurer’s\nbranch, so it is also delayed until at least OCaml 4.10.\n\nImproved DWARF and GDB support\n\nSome parts of the DWARF support were merged into 4.08, but then we\ndecided to rewrite some of it to produce better behaviour in gdb. The\nrewritten patches have been submitted as pull requests\n(PR#2280,\nPR#2281,\nPR#2286,\nPR#2290,\nPR#2291,\nPR#2292,\nPR#2294,\nPR#2300,\nPR#2303,\nPR#2305,\nPR#2308,\nPR#2316,\nPR#8614), some of which\nhave been merged. However, there has been some disagreement with\nupstream about the scale of some of these changes vs. their\nbenefit. Addressing these concerns will probably require significant\nwork, so we are going to put the gdb work to one side for now until we\nhave enough spare cycles to get things into a state that is acceptable\nupstream.\n\nMove the parser to Menhir\n\nPR#292 by Gabriel Scherer\nat INRIA, Nicolás Ojeda Bär at Lexifi and Frédéric Bour at Facebook,\nwhich we helped to test and review, was merged into 4.08. It replaces\nthe parser with one based on the Menhir parser generator, but it does\nnot yet take advantage of Menhir’s advanced error handling: so the\nsyntax errors remain as uninformative as before.\n\nAdd unsigned integer operations\n\nWe helped a little with getting Nicolás Ojeda Bär’s\nPR#1458 merged. It added\nunsigned integer operations to the standard library’s Int32, Int64 and\nNativeint modules. The initial implementation uses OCaml implementations\nof these operations. We also wrote code generation for implementing the\noperations in native code but this has not yet been upstreamed.\n\nUnplanned work\n\nIn addition to all the work we did trying to implement our planned\nfeatures, we also did a lot of work that was, for one reason or another,\nnot on our original plan.\n\nMonadic let operators\n\nPR#1947 adds support for\n“monadic” let operators to the language. These essentially bring\nlet%bind and let%map from\nppx_let to the language\nitself. This was implemented somewhat on a whim, and quickly devolved\ninto a very long discussion about syntax. Despite this it did make it\ninto OCaml 4.08, so it is now possible to write things like:\n\nlet ( let* ) o f =\n  match o with\n  | None -&gt; None\n  | Some x -&gt; f x\n\nlet return x = Some x\n\nlet find_and_sum tbl k1 k2 =\n  let* x1 = Hashtbl.find_opt tbl k1 in\n  let* x2 = Hashtbl.find_opt tbl k2 in\n    return (x1 + x2)\n\n\n\nFixing “levels” and “scopes”\n\nIn OCaml 4.07.0 we refactored some important parts of how GADTs are\nimplemented. This allowed us to implement disambiguation for GADT\nconstructors. However, one of our changes introduced a bug visible in\nthe reported issues\nPR#7822,\nPR#7833 and\nPR#7835 as well as an\nunreported soundness issue that affected enough of our code that we\nrolled back to OCaml 4.06.1 internally.\n\nTo get things back into a safe state we reverted a small part of the\nchange from 4.07.0 in\nPR#1997 along with a\ncouple of other small fixes. This was released in OCaml 4.07.1.\n\nThe underlying cause for this bug was an awkward invariant around the\nrepresentation of bound identifiers in the type-checker. This invariant\nwas needed to correctly implement the “The type constructor foo would\nescape its scope” error. The need for this invariant came from using a\nsingle number (the “stamp”) to serve two different roles. In\nPR#1980 we split this\nnumber into two numbers (a “stamp” and a “scope”), eliminating this\ninvariant and hopefully preventing similar bugs from appearing in the\nfuture.\n\nRefactor lookup functions\n\nWhen implementing transparent ascription, we needed to make some changes\nto the functions that lookup identifiers from the type\nenvironment. These functions were a bit convoluted so in\nPR#2127 we rewrote how\nthese functions work to make things clearer. This patch has not yet been\nmerged upstream.\n\nMake Dynlink sound\n\nOCaml’s Dynlink module, which provides support for dynamic linking,\nhas never done enough checking to ensure that loaded modules can safely\nbe linked into the program. This has produced many bug reports:\nPR#4208,\nPR#4229,\nPR#4839,\nPR#6462,\nPR#6957,\nPR#6950, etc.\n\nWe finally fixed these issues in\nPR#106 which rewrote all of\nthe module tracking done by Dynlink to ensure that modules are only\nloaded if it is safe to do so. Unfortunately, the checks we implemented\nwere a little overzealous and broke Coq’s plugin mechanism\n(PR#7876).\nPR#2176 fixed that by\nmaking the checks a little more accurate. These pull requests were\nmerged into 4.08.\n\nChange representation of class signatures\n\nAs part of adding support for disambiguating GADTs in OCaml 4.07 we had\nto make a small change to how classes and object literals were\ntype-checked. Unfortunately this introduced some bugs including\nPR#7894. These bugs are\nvery similar to previous bugs in the same part of the type-checker such\nas PR#5498. We decided\nthat changing the representation of class signatures within the type\nchecker would resolve these kinds of bugs once and for all.\n\nWhilst making this change we discovered a number of other issues in this\npart of the compiler, and the resulting pull request\nPR#8516 became quite\ninvolved. This patch has not yet been merged upstream.\n\nRefactor the construction of the initial environment\n\nIn OCaml 4.07 the standard library was put into a single Stdlib\nmodule. This Stdlib module is opened by default so that, for example,\nStdlib.List.map is still available as List.map. However, that meant\nthat modules from Stdlib would always shadow other modules with the same\nname (PR#7841). To fix\nthis, PR#2041 changed how\nexternal modules are represented in the type environment and how the\ninitial type environment is constructed. This fix was included in 4.08.\n\nExceptions under or-patterns\n\nBack in 2015 we implemented support for having exception patterns within\nor-patterns (PR#305). This\nallows you to write things like:\n\nlet get (t : int option String.Table.t) (key : string) =\n  match Hashtbl.find_exn t key with\n  | Some x -&gt; x\n  | None | exception Not_found -&gt; 0\n\n\n\nThis was merged upstream in time for OCaml 4.03, but there were problems\nwith its implementation\n(PR#7083) and it was\nreverted. Last year we finally found time to rewrite the implementation\n(PR#1568) and the feature\nwas merged for OCaml 4.08.\n\nReproducible builds\n\nAs part of work on improving our build times via various forms of\ncaching we’ve suddenly become very interested in getting OCaml builds to\nbe fully reproducible. With that aim in mind we made a number of patches\nto the compiler:\nPR#1845,\nPR#1856,\nPR#1869,\nPR#1930. These were\nincluded in OCaml 4.08.\n\nWork done outside of Jane Street\n\nThe Jane Street Tools & Compilers team are just some of the contributors\nto the OCaml compiler. A lot of work on the OCaml compiler is done\noutside of Jane Street. The\nChanges file\nincludes a full list of everything that’s gone into OCaml\n4.08. Highlights include…\n\nNew notion of “alerts” that generalizes deprecation warnings\n\nIn PR#1804 Alain Frisch,\nfrom LexiFi, added support for a new alert attribute:\n\nval foo: int -&gt; int\n  [@@alert unsafe \"Please use bar instead!\"]\n\n\n\nwhere unsafe is just an arbitrary lowercase identifier. Any uses of foo\nwould then produce a warning:\n\nAlert unsafe: foo\nPlease use bar instead!\n\n\n\nThese warnings can then be turned on and off also using attributes:\n\nlet y = foo 5 [@alert \"-unsafe\"]\n\n\n\nor using the command-line as\n\nocamlopt ... -alert -unsafe ...\n\n\n\nThe existing attribute [@deprecated \"msg\"] is now just sugar for\n[@alert deprecated \"msg\"].\n\nNew modules for the standard library\n\nThe changes to how the standard library is packaged in OCaml 4.07 made\nit much easier to add new modules to the standard library without\nbreaking things. This prompted the addition a number of new modules to\nthe standard library by Daniel Bünzli:\n\n\n  Fun (PR#2129)\ncontaining functions for working with functions\n  Bool (PR#2010)\ncontaining functions for working with ~bool~s\n  Int (PR#2011)\ncontaining functions for working with ~int~s\n  Option (PR#1940)\ncontaining functions for working with ~option~s\n  Result (PR#1956)\ncontaining functions for working with ~result~s\n\n\nHere at Jane Street everyone uses Core and Base, which have always had\nsuch luxuries, but for those of us who occasionally have to make do with\nthe standard library – for example when working on the compiler itself\n– these are exciting additions.\n\nImproved error messages\n\nThere were many pull requests, mostly from Florian Angeletti or Armaël\nGuéneau, to improve the quality of error messages from the type-checker:\nPR#1720,\nPR#1733,\nPR#1993,\nPR#1998,\nPR#2058,\nPR#2094,\nPR#2140,\nPR#6416,\nPR#1120.\n\nSwitching the configure system to autoconf\n\nPR#2044,\nPR#2059,\nPR#2113,\nPR#2115 and\nPR#2139 by Sébastien\nHinderer from INRIA replaced the old hand-rolled configure system with\none based on autoconf. This should make it easier to maintain, and\nhopefully pave the way for much better cross-compilation support.\n",
        "url"      : "https://blog.janestreet.com/a-look-at-ocaml-4.08/",
        "image"    : "https://blog.janestreet.com/a-look-at-ocaml-4.08/ocaml_release-2019.jpg",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Of Pythons and Camels",
        "date"     : "July 9, 2019",
        "authorId" : ["lpuchallafiore","lmazare"],
        "author"   : null,
        "tags"     : [],
        "minsToRead" : 11,
        "content"  : "Welcome to another post in our series of how to use OCaml for machine learning.\nIn previous posts we’ve discussed artistic style-transfer and\nreinforcement learning. If you haven’t read these feel\nfree to do so now, we’ll wait right here until you’re done. Ready? Ok, let’s\ncontinue …\n\nIn this post, we follow the lines of the PyTorch Transfer Learning Tutorial.\nTransfer Learning has become an essential building block of machine learning. In\norder to build efficient models on a small dataset, the idea is to reuse some\nmodel trained on a large generic dataset and the specialize it to work on the\nsmaller - different but related - task. This can cut down significantly the\namount of GPU/CPU time needed to train the final model, and the amount of\ntraining data required. The PyTorch tutorial uses a deep Convolutional Neural\nNetwork (CNN) model trained on the very large ImageNet dataset (composed of more\nthan one million pictures spanning over a thousand classes) and uses this model\nas a starting point to build a classifier for a small dataset made of ~200\nimages of ants and bees.\n\nPythons and Camels\n\nWe will build a similar classifier based on a pre-trained CNN but rather than\nusing it to separate images of ants from bees, we will use it to separate two\ndifferent kinds of animals: pythons and camels. In the PyTorch tutorial the\nimages of ants and bees come from a subset of ImageNet - and the network itself\nis trained on ImageNet. Here we use a different dataset to get images of pythons\nand camels, the Open Images Dataset V5.\n\nThis dataset contains categories related to pythons, and categories related to\ncamels. Overall, we extract 249 images of pythons and 822 images of camels. Some\nrandomly selected images are represented below.\n\n\n\n\nCaption: Example images of pythons and camels. From left to right, top to bottom:\nDon't come closer by Alias 0591,\nBig Bad Boa by Stacy Arrington,\nPiton by Eric Caballero,\nCamels by stevebrownd50,\nCamel, close-up by Irene2005,\nand hey there fella by Reinis Traidas.\nAll images used under CC BY 2.0 license.\n\n\nTransfer Learning\n\nAs detailed in the original tutorial, there are two main alternatives to train\nthe classifier.\n\n\n  Finetuning the pretrained model. We start from a model pretrained on ImageNet,\nreplace the last layer by a binary classifier and train the resulting model as\nusual.\n  Using a pretrained model as feature extractor. The pretrained model weights\nare frozen and we run this model and store the outputs of the last layer\nbefore the final classifier. We then train a binary classifier on the\nresulting features.\n\n\nThese two approaches are not mutually exclusive: the second approach can be\nused to train a new top layer and then the lower layers can be “unfrozen” to\nfinetune the entire model with a very small learning rate for a few epochs.\n\nIn this post we focus on the second alternative. We use a ResNet-18 model, the\nResNet family of model was introduced at the end of 2015 and is now\nvery widely used in computer vision. In 2015, the original ResNet model training\nrequired several weeks of GPU compute time 1. Using transfer learning we can\nbuild our pythons vs camels model with less than a minute of a 2015 laptop CPU\ncompute time - a significant improvement. We have an OCaml implementation of\nthis network using the ocaml-torch bindings\nand for which pre-trained weights are available.\n\nThe code to fine-tune the model can be found in this file\nlet’s have a more in depth look at it. First, we load the images from our dataset.\n\nlet dataset = Imagenet.load_dataset ~dir:Sys.argv.(2) ~classes:[\"camel\"; \"python\"] () in\nDataset_helper.print_summary dataset;\n\n\n\nThe print_summary function prints the dimensions of the tensors that have been\ncreated. For training the tensor has shape 822x3x224x224, this corresponds to\n822 images of height and width both 224 with 3 channels (PyTorch uses the NCHW\n– Num samples x Channels x Height x Width –\nordering for image data). The testing image tensor has dimensions 249x3x224x224\nso there are 249 images with the same size as used in training.\n\nThe pixel data from the dataset is converted to features by running a\npre-trained ResNet model. This is done in the following snippet:\n\n(* Precompute the last layer of the pre-trained model on the whole dataset. *)\nlet dataset =\n  let frozen_vs = Var_store.create ~frozen:true ~name:\"rn\" () in\n  let pretrained_model = Resnet.resnet18 frozen_vs in\n  Stdio.printf \"Loading weights from %s.\\n%!\" model_path;\n  Serialize.load_multi_\n    ~named_tensors:(Var_store.all_vars frozen_vs)\n    ~filename:model_path;\n  Stdio.printf \"Precomputing activations, this can take a minute...\\n%!\";\n  Dataset_helper.map dataset ~batch_size:4 ~f:(fun _ ~batch_images ~batch_labels -&gt;\n      let activations =\n        Layer.forward_ pretrained_model batch_images ~is_training:false\n      in\n      Tensor.copy activations, batch_labels)\n\n\n\nThis snippet performs the following steps:\n\n\n  A variable store frozen_vs is created. Variable stores are used to hold\ntrainable variables. However, in this case no training is performed on the\nvariables so we use ~frozen:true which should slightly speed-up the model\nevaluation.\n  A ResNet-18 model is created using this variable store. At this point the\nmodel weights are randomly initialized.\n  Serialize.load_multi_ loads the weights stored in a given file and copies their\nvalues to the model weights tensors. Tensors are named in the serialized file\nin a way that matches the names we used when creating the ResNet model.\n  Finally for each tensor of the training and testing datasets,\nLayer.forward_ pretrained_model runs the forward pass of the model and\nreturns the resulting tensor. In this case the result is a vector of 512\nvalues per sample.\n\n\nNow that we have precomputed the output of the ResNet model on our training and\ntesting images we will train a linear binary classifier to recognize pythons\nfrom camels. We start by defining a model, for this we need a variable store to\nhold the trainable variables. Then we run gradient descent to optimize\nthe cross-entropy loss between the ground truth and the model predictions. As we\nonly have to train a small linear model we only loop over the dataset a small\nnumber of times. Overall this should run in less than a minute even on a\nlaptop CPU and achieve near 100% accuracy. This is significantly faster than the\nmultiple weeks of GPU used to train the original ResNet and is one of the\nappeals of transfer learning.\n\nlet sgd = Optimizer.sgd train_vs ~learning_rate:0.001 ~momentum:0.9 in\nfor epoch_idx = 1 to 20 do\n  Dataset_helper.iter dataset ~batch_size ~f:(fun _ ~batch_images ~batch_labels -&gt;\n      let predicted = model batch_images in\n      (* Compute the cross-entropy loss. *)\n      let loss = Tensor.cross_entropy_for_logits predicted ~targets:batch_labels in\n      Optimizer.backward_step sgd ~loss);\n  (* Compute the validation error. *)\n  let test_accuracy = Dataset_helper.batch_accuracy dataset `test ~batch_size ~predict:model in\n  Stdio.printf \"%3d   test accuracy: %.2f%%\\n%!\" epoch_idx (100. *. test_accuracy)\ndone\n\n\n\nUsing ImageNet Labels\n\nReaching 100% accuracy on this pythons vs camels dataset is quite amazing.\nHowever, ImageNet has categories for pythons and camels so couldn’t we just\nstick with the original ResNet-18 network and compare the scores of the python\nand camel classes?\n\nlet camel_idx = 354\nlet python_idx = 62\n\n(* Prints the proportion of python images in a directory. *)\nlet process model ~dir =\n  (* Load all the images in a directory. *)\n  let images = Imagenet.load_images ~dir in\n  Tensor.print_shape images ~name:dir;\n  (* Run the model on the images and compute all class logits. *)\n  let logits = Layer.forward_ model images ~is_training:false in\n  (* Isolate the logits for python and camel classes. *)\n  let python_logits = Tensor.narrow logits ~dim:1 ~start:python_idx ~length:1 in\n  let camel_logits = Tensor.narrow logits ~dim:1 ~start:camel_idx ~length:1 in\n  let python_proba =\n    (* Compute python &gt;= camel and the mean to get proportion of python images. *)\n    Tensor.(mean (ge1 python_logits camel_logits |&gt; to_type ~type_:(T Float)))\n    |&gt; Tensor.to_float0_exn\n  in\n  Stdio.printf \"Python: %.2f%%\\n%!\" (100. *. python_proba);\n  Stdio.printf \"Camel : %.2f%%\\n%!\" (100. *. (1. -. python_proba))\n\n\n\n\nIf we do this, with no finetuning, we can get an accuracy of &gt;98% over the entire dataset.\n\nA More Challenging Problem\n\nLet us try with a more challenging problem. There are two different kind of\ncamels. The perl camel has a\nsingle hump (it is also known as the arabian camel) whereas the\nocamel has two humps (it is also known as the\nbactrian camel). ImageNet does not have two different categories for these and lumps them\ntogether as “camel”, but luckily Open Images V5 does have separate labels. We\nhave created a very small dataset, only 165 photos of arabian camels and 70\nphotos of bactrian camels for training and 55 and 23 photos, respectively, for\nvalidation. An example of some of these images are shown below:\n\n\n\n\nCaption: Example images of arabian camels and bactrian camels. From left to right, top to bottom:\nOman_7251 by Luca Nebuloni,\nCamels in Dubai by Liv Unni Sødem,\nShip of desert by Tanya.K.,\nMiami Metro Zoo Camello by Jorge Elías,\nCamels by J. Todd Poling,\nand Camel! by Beatrice Murch.\nAll images used under CC BY 2.0 license.\n\n\nLet us run our code again on this camel vs camel dataset and plot the training\nloss together with the accuracy on the testing set.\n\n\n\nIt works! And we get &gt;90% test accuracy even with our small training data set.\n\nWhy does this work? Because the lower layers in the ResNet18 network learn to\nidentify common patterns, it is only near the top that the specialization into\nthe ImageNet classes takes place. We remove the top layer and use the learned\nfeatures to of the lower layers in order to build a classifier for our small\n2-class camel dataset.\n\nThanks for reading. The full code can be found in\nfinetuning.ml and\npredict_labels_only.ml.\n\n\n  \n    \n      14 days for ResNet18, 52 days for ResNet101: http://torch.ch/blog/2016/02/04/resnets.html &#8617;\n    \n  \n\n",
        "url"      : "https://blog.janestreet.com/of-pythons-and-camels/",
        "image"    : "https://blog.janestreet.com/of-pythons-and-camels/camel-identify.jpg",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Thoughts from AAAI 2019",
        "date"     : "May 13, 2019",
        "authorId" : "lpuchallafiore",
        "author"   : "Loren Puchalla Fiore",
        "tags"     : [],
        "minsToRead" : 4,
        "content"  : "At Jane Street, for the last several years, we have been increasingly interested\nin machine learning and its many use cases. This is why it was exciting when\nearlier this year myself and a few of my colleagues had the opportunity to\nattend the AAAI 2019 conference. We’d like to take this space to share with you\nsome of the interesting projects and themes we saw at the conference.\n\nInterpreting Neural Networks\n\nNeural networks can achieve superhuman results in a number of problem domains,\nsuch as image classification and game-playing. However, the learned network is\noften a black box that “magically” computes a good answer. Attempting to\nunderstand what networks are doing and using that knowledge to build\ninterpretable AI is therefore a large research problem.\n\nSome recent results (Gradient visualization, DeepLIFT, InfoGAN) show it is\npossible to gain some insight into neural networks by looking at layer\nactivations. But at AAAI we saw Ghorbani, et al. show that these techniques are fragile and adversarial\ntechniques can be used to generate inputs that arbitrarily move layer\nactivations while still giving the correct classification result. We also saw\nseveral papers, such as this one on climate, discussing the benefits of\nclassical AI over deep networks with respect to interpretability.\n\nAt the same time many presentations showed improvements in the ability to\ninterpret more types of models models such as deep q-networks and building interpretability into the\ntraining by using specialized network structures to encourage the\nlearned model to take on semantically meaningful grammar-based behavior at each\nlayer.\n\nAI for Social Good\n\nAn exciting topic that we saw much discussion of at AAAI was the use of\nartificial intelligence and machine learning for social good. There were\ninteresting papers on a number of applications including fake news\ndetection and\nfiltering; using image classification to detect and curtail human\ntrafficking; and statistical\nmethods for feature engineering of data collected by citizen scientists.\n\nTwo larger areas which I found fascinating were the areas of social AI and\nfairness in machine learning. Social AI is concerned with the construction of\nrobots and conversational agents that exhibit social characteristics (e.g. small\ntalk, facial expressions, give-and-take conversation, …). It’s shown that\nthese social agents perform better than agents without social cues in early\nliteracy education and applications in this field were discussed at length in Cynthia\nBreazeal’s keynote.\n\nFairness in machine learning attempts to resolve the growing concern about\nautomated decision models with respect to protected classes like race, gender,\nor other axes and the resulting policies that come from these automated models.\nS. Ghili, et al. and C.\nDimitrakakis both described models that use latent variables\nin a bayesian setting to describe fairness for supervised tasks, and M. Olfat,\net al. showed how to\nenforce fairness in unsupervised problems (such as clustering insurance\nactuarial data).\n\nGames and Simulation\n\nOne of the workshops at AAAI was on games and simulations for artificial\nintelligence. There were a few interesting themes in the workshop. One was that\nas AI techniques are applied to increasingly complex real-world environments,\nthere is a need for more sophisticated, high fidelity simulations for training\npurposes. For example, in order for self-driving cars to reach the point where\nthere is a high degree of trust in their safety, many hours behind the wheel are\nneeded. The more of this that can be done in simulation, the more cost-effective\nand rapid the solutions will arrive. However, the many ways in which the real\nworld can differ from simulation can undermine the simulation’s effectiveness.\n\nAs a method of addressing the gap between simulation and reality, one simple but\npowerful technique that came up was the use of randomization in simulations.\nThis came out most clearly perhaps in an invited talk by Maciek Chociej of\nOpenAI. Maciek discussed techniques used for training a human-like robot hand\nfor dexterous manipulation of physical objects. The\nrobot was trained entirely in simulation and the skills learned effectively\ntransferred to a physical robot. There are many opportunities for a mismatch\nbetween simulation and reality in such a project. This “reality gap” exists in\nperception (cameras applied to real lighting conditions) and on the imprecise\nmodeling of the physical robot (friction, slippage, simplified assumptions of\napplying torque to joints instead of tendon-based actuation, rigid body contact\nmodels instead of deformable body contact, and so on.) By applying randomness\nthroughout the simulation for various parameters that can differ from the real\nworld, the learned policy became less sensitive to these parameters, leading to\nbetter generalization. We’ve been thinking and discussing applications of this\nwithin our own business, where real-world markets differ from simulations.\n\nWe saw many more exciting talks at the conference, far too many to cover here in\ndetail. We are excited to see where machine learning will grow, and looking\nforward to trying to make our own contributions to this field.\n\nDid you attend AAAI or another machine learning conference recently? See\nanything interesting? Let us know in the comments below!\n",
        "url"      : "https://blog.janestreet.com/thoughts-from-aaai-19/",
        "image"    : "https://blog.janestreet.com/thoughts-from-aaai-19/AAAI.jpg",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Learning ML Depth-First",
        "date"     : "April 17, 2019",
        "authorId" : "jsomers",
        "author"   : "James Somers",
        "tags"     : [],
        "minsToRead" : 1,
        "content"  : "If you haven’t heard of it, Depth First\nLearning is a\nwonderful resource for learning about machine learning.\n\nGrown out of a Google AI residency, the DFL program builds curricula around\nspecific ML papers. If you were intrigued by the AlphaGoZero paper, for\ninstance, but felt you couldn’t fully appreciate it – maybe you’re a little\nrusty on the Bellman equation, or you haven’t spent much time with Monte Carlo\nTree Search – DFL has built an entire self-paced\nclass, with background\nreading, lectures, and practice problems, that culminates in the paper itself.\nSo far, they’ve built guides like this for\nDeepStack,\nInfoGAN, and\nTRPO.\n\nLast year, we decided to sponsor Depth First Learning by funding grants for four\nfellows to create new curricula, and better yet, to use their new materials to\nrun 6-week, open, online classes. To our delight, 113 people applied, and late\nlast week, DFL announced the recipients.\n\n\n  Steve Kroon - Stellenbosch (South Africa) - Variational Inference with\nNormalizing Flows\n  Sandhya Prabhakaran - New York (USA) - Spherical\nCNN\n  Bhairav Mehta - Montreal (Canada) - Stein Variational Gradient\nDescent\n  Vinay Ramasesh, Piyush Patil, and Riley Edmunds - Berkeley (USA) Resurrecting\nthe sigmoid in deep learning through dynamical\nisometry\n\n\nCongratulations! We can’t wait to take your new courses.\n\nAt Jane Street, technical education has always been a core part of the culture.\nIt’s not just about having a library in the office (though that helps, too) –\nwe’re constantly looking for ways to help engineers go a level deeper, whether\nthat takes the form of talks, trips to conferences, or internal classes built,\nlike DFL’s, around specific libraries and projects.\n\nWe’ve found that going “depth-first” is always better in the long run.\n",
        "url"      : "https://blog.janestreet.com/learning-ml-depth-first/",
        "image"    : "https://blog.janestreet.com/learning-ml-depth-first/Depth_First_Realigned.svg",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Machining the ultimate hackathon prize",
        "date"     : "February 28, 2019",
        "authorId" : "jsomers",
        "author"   : "James Somers",
        "tags"     : [],
        "minsToRead" : 10,
        "content"  : "Jane Street is sponsoring this year’s MakeMIT\nhackathon, and we wanted to create a prize for\nthe winners that would do justice to the maker spirit of the\ncompetition. As makers ourselves – it’s not unusual to find a\n“software” engineer here who hacks on FPGAs or who has a CNC machine\nat home – it felt natural to get our hands dirty.\n\nWe decided to build mechanical 68-key keyboards with Brown Cherry MX\nswitches in a custom-machined walnut case finished, on the bottom,\nwith a laser-engraved brass plate. (“If it’s worth doing, it’s worth\ngoing overboard to do it well.”) Although Browns don’t get much love,\nthey’re quiet enough to use at the office.\n\nWhile everyone’s endgame keyboard is different, we still hope the\nwinners appreciate these.\n\nHow it’s made\n\nIf you took the keyboard apart, you’d find three main pieces: the wood\ncase; the controller board, which wires the keys together and contains\nthe actual firmware for driving the keyboard; and a machine-cut brass\nplate – like a piece of robotic swiss cheese – that the switches are\nmounted to. (On the bottom, there’s another brass plate for\ndecoration.)\n\nThe PCB\n\n\n\nIf you want, it’s not so hard to hand-wire a set of keys into the rows\nand columns of a keyboard, adding diodes for\nanti-ghosting. But\nit’s a lot easier to buy a PCB, which comes with the wires, the\ntraces, the diodes, and a microcontroller (like an Arduino) already\nprogrammed to read the key matrix. We used the PCB for the Tada\n68, which uses the\nQMK keyboard controller\nfirmware.\n\nWe’d love to have fired up KiCad to design our own PCB, but we have\nour regular work too!\n\nThe plate\n\n\n\nThe plate was the first piece to be designed, since the case must be\nbuilt around it. It’s simple: just a 1.5mm-thick sheet of brass with\nsquare holes in it, where you snap in your Cherry switches, and\nsmaller round holes, for stabilizers. While the plate itself\nstabilizes the keyboard as a whole and gives it a nice heft, the\n“stabilizers” ensure that longer keys, like the space bar, have a\nconsistent feel. (If you hit the left side of the key, you don’t want\nto feel like you’re pressing down on a see-saw.)\n\nTo build the plate, all we had to do was produce a CAD file and send\nit to Laser Boost, in Spain, who did\nthe actual cutting.\n\nConnecting the key switches to the plate was a matter of snapping them\nin, and then through-hole soldering each switch to the corresponding\npad on underside of the PCB.\n\nBy far the bulk of the work went into designing and building the case.\n\nThe case\n\n\n\nYou can find plenty of pre-built cases for so-called compact\n“60%”-layout\nkeyboards, but these layouts are missing arrow keys. For our purposes,\nthat was a bridge too far; we prefer the rarer 68-key layout. But this\nput us into classic maker territory – with no way to buy an existing\nwooden case that matched our specifications, we had to build our own.\n\nWe based our design on this nice wooden keyboard, and\nbegan by building a simple 3D model in\nRhino, just to get the measurements and\nclearances of the three main parts settled up front.\n\nFor the wood, we used a single large piece of 8-quarter walnut from\nthe Boards & Beams lumber yard\nin Fairfield, NJ. We ran it through a planer and table saw to cut it\ndown to size.\n\nIn the original design, we had little throwaway “tabs” on either side\nof the keyboard, each with a hole in it that could be used to anchor\nthe block of wood to the gantry on the cutting machine. That way, when\nwe flipped the piece upside down, we could be sure that it would\ncontinue to be aligned perfectly – which was necessary to cut the two\nhalves of the hole where the USB cable connects to the PCB. But this\nstep was dropped in later designs, where that hole was cut simply by\nturning the block of wood on its side.\n\nThe CNC router we used was a Shopbot Desktop\nMAX. In essence, you\nmount your block of material underneath the CNC to a table. You attach\nsome kind of tool to the head of the machine – in this case, we used\n1/4” and 1/8” square end mills. Then the machine moves on tracks in\nthe X, Y, and Z directions, applying pressure to your material, and\ncutting it according to the precise instructions you provide via your\nCAD program. (For the cuts, we exported our Rhino model into Fusion\n360 for its\nexcellent CAD capabilities.)\n\nThe cut instructions are known as a “tool path,” and the first paths\nwe used involved simple facing and contouring operations, basically\nmaking the face of our wood uniform and smooth. These are calibration\nsteps for the later cuts. Using tool no. 4, the CNC machine does this\nby making consistent passes over the whole block.\n\nThe tool path for the “adaptive clearing path” cut, in which we\nactually carved out a space for PCB, plate, and mounting holes, was\nselected not just to cut the precise shape we wanted, but to minimize\nthe amount of wear and tear on the tool itself, and to avoid throwing\noff unnecessary chips of material. You can see it here:\n\n\n\nAnd here, you can see an simulation of what the actual cut looks like:\n\n\n\nAfter this cut, there were still little bits of material left over\nwhere the quarter-inch tool was too large to go, especially around the\nsmall mounting holes. We cleaned these up with a second round of\nclearing using a smaller 1/8th-inch tool, and then oriented the wood\non its side to carve out the port for the USB cable.\n\nFinally, to give the keyboard a slight lean (for optimal wrist\nposition) we flipped the case upside down, and did another facing\noperation, this time orienting the case with an angle in our\ngantry. Making multiple passes with a ball end tool, stepping down the\nangle, we were able to engineer a seamlessly smooth cut, and a\n5-degree grade:\n\n\n\n(We’re all still learning CNC, but found Michal Zalewski’s Guerilla\nGuide to CNC to be excellent!)\n\nAssembly\n\nPart of what gives the keyboards their solid feel is that there are\nhardly any parts to assemble. After placing the PCB/plate piece into\nthe case, we mounted everything into place using Yardley threaded\ninserts, which\nare pressed into the holes we’d cut earlier. These have machine\nthreads on the inside, which we used to screw the PCB to the case.\n\nThe decorative plate\n\n\n\nA Jane Street project wouldn’t be complete without a little OCaml. As\na finishing touch, we designed a small decorative plate, also in\nbrass, with the Jane Street logo. The twist is that the logo itself\nwas defined programmatically via an OCaml DSL that sits on top of\nSigned Distance\nFunctions.\n\nSigned distance functions represent shapes as functions \nfrom a coordinate to the distance to that shape’s edge (positive if outside \nof the shape, negative if inside, 0.0 if exactly on the edge).\n\ntype shape = x:float -&gt; y:float -&gt; float\nlet circle ~cx ~cy ~r ~x ~y = \n  let dx, dy = cx -. x, cy -. y in\n  (Float.sqrt (dx *. dx +. dy *. dy)) -. r\nlet union ~a ~b ~x ~y =\n  Float.min (a ~x ~y) (b ~x ~y)\n\n\n\nSampling these functions inside of\nMarching Squares\ngives us the poly-lines that we can give to CAD/CAM software to process \nthe tool path for carving the logo into brass.\n\nThe code that sits beside it:\n\nlet circle ~r = circle ~x:0.0 ~y:0.0 ~r\n\nlet disk ~ri ~ro = subtract (circle ~r:ro) (circle ~r:ri)\n\nlet notch ~ri ~ro ~iw ~r =\n  let dist = (ro -. ri) in\n  let half_notch r = rotate_around ~x:0.0 ~y:ri ~r\n      (rect ~x:(-. iw /. 2.) ~y:(ri -. dist /. 2.0) ~w:iw ~h:(ro -. ri+. dist)) in\n  union [ half_notch r ; half_notch (-. r) ]\n\nlet ring ~ri ~ro ~iw ~r = subtract (disk ~ri ~ro) (notch ~ri ~ro ~iw ~r)\n\nlet letter_to_rotation l =\n  let idx = (Char.to_int l) - (Char.to_int 'a') in\n  Float.pi -. 2.0 *. Float.pi *. (Float.of_int idx) /. 26.0\n\nlet gen_logo chars =\n  let initial_radius, ring_thickness, space_between = 17.5, 4.0, 8.5 in\n  let gap_width, gap_angle = 9.0, 0.1 in\n  let _, logo = List.fold chars ~init:(initial_radius, []) ~f:(fun (start, prev) chr -&gt;\n      let next = ring\n          ~ri:start ~ro:(start +. ring_thickness)\n          ~iw:gap_width ~r:gap_angle in\n      start +. space_between, (rotate ~r:(letter_to_rotation chr) next) :: prev)\n  in logo |&gt; union\n\nlet js_logo = gen_logo ['j'; 's'; 'c']\n\n\n\nWe hope the hackathon contestants enjoy their prizes – we certainly\nhad fun making them!\n",
        "url"      : "https://blog.janestreet.com/hackathon-keyboards/",
        "image"    : "https://blog.janestreet.com/hackathon-keyboards/keyboard.jpg",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Accelerating Self-Play Learning in Go",
        "date"     : "February 28, 2019",
        "authorId" : "dwu",
        "author"   : "David Wu",
        "tags"     : [],
        "minsToRead" : 3,
        "content"  : "At Jane Street, over the last few years, we’ve been increasingly exploring machine learning to improve our models. Many of us are fascinated by the rapid improvement we see in a wide variety of applications due to developments in deep learning and reinforcement learning, both for its exciting potential for our own problems, and also on a personal level of pure interest and curiosity outside of work.\n\nAbout a year ago, motivated by AlphaGo and AlphaZero, I started a personal research project outside of work to experiment with neural net training in Go. While there was plenty to experiment with in just supervised learning and search alone, it wasn’t too long before I accumulated a variety of ideas that would each take at least part of an actual self-play training run to properly test. This was tricky, since self-play training in a game as large as Go requires a very large amount of compute. A single full run by AlphaZero required thousands of TPUs over several days, Facebook’s ELF OpenGo used two thousand GPUs for more than a week, and Leela Zero, an distributed open-source project running off of computation donated by volunteers has taken more than a year to reach top levels.\n\nWith the help of Jane Street, I’ve now been able to perform a variety of short runs and a medium-length run to begin testing some of these ideas and techniques, some of which might also have further applications beyond just Go. While our runs were not nearly as long as any of the full runs mentioned above, we still achieved strong professional or possibly superhuman levels of strength, along with other interesting and promising results. Today, we’ve released a paper detailing these results.\n\n\n  Paper: Accelerating Self-Play Learning in Go\n  Source code and trained nets: https://github.com/lightvector/KataGo\n  Live bot for play online: kata-bot (OGS)\n\n\nSee the paper for more details, but some points of interest:\n\n\n  Although we have not yet been able to test further, for reaching at least a strong human professional level to a very rough estimate these new techniques togther appear to accelerate learning by as much as a factor of 5 compared to Leela Zero.\n  For some earlier parts of training, the improvement was almost a factor of 100. With the code linked above, going from zero up to moderate expert level (amateur-dan level) on the full 19x19 board should now be possible for anyone with merely a few GPUs in as little as a day or two!\n  With a minor cost in strength, with the right training architecture a single neural net can be trained to play well on a wide range of board sizes simultaneously.\n  Adding score maximization as a secondary objective accelerates learning, at least up to the point we were able to test so far. It also allows the bot to play reasonably in handicap games even without any other special methods, something known to be difficult for other “zero-trained” bots. Example game 1, Example game 2, Example game 3.\n\n\n\n\n\nRelative Elo rating (y-axis) of three progressive-sized neural nets from our run versus Leela Zero (LZ), plotted by estimated cumulative self-play computation required (x-axis, see paper for details on units).\n\n\n\n\n\n\n\nNeural net prediction of final ownership of all board points. Red through green increasingly indicate likely white ownership, cyan through magenta increasingly indicate likely black ownership.\n\n\n\n\nAlong the way, I also encountered many interesting technical details or directions for further exploration. In the coming months, I hope to run some more experiments and present some of these other ideas and results in future blog posts.\n",
        "url"      : "https://blog.janestreet.com/accelerating-self-play-learning-in-go/",
        "image"    : "https://blog.janestreet.com/accelerating-self-play-learning-in-go/go.jpg",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Playing Atari Games with OCaml and Deep Reinforcement Learning",
        "date"     : "February 2, 2019",
        "authorId" : "lmazare",
        "author"   : "Laurent Mazare",
        "tags"     : [],
        "minsToRead" : 16,
        "content"  : "In a previous blog post\nwe detailed how we used OCaml to reproduce some classical deep-learning results\nthat would usually be implemented in Python. Here we will do the same with\nsome Reinforcement Learning (RL) experiments.\n\nThe previous post was using TensorFlow but this time we will be using\nPyTorch through some\nocaml-torch bindings.\nThis will let us train an agent playing Pong. The PyTorch website features\na dedicated reinforcement learning tutorial\nusing the Python api, this tutorial provides more details on RL and on the DQN\nalgorithm that we are using in this post so this is a nice complementary read.\n\nThe RL techniques we used here have been explored for a couple of\nyears. The only difference is that we are implementing them in OCaml. Of\ncourse, this is fun, but there is a practical benefit too. RL usually involves\nmore infrastructure and code than supervised learning so it’s a place where\nhaving a strong type system could be useful, e.g. to let you easily refactor\nsome components without being scared of breaking all the agents that rely on\nit. Using Python with some good test coverage is the common way to achieve this\nbut with OCaml you can get even stronger guarantees by relying both on testing\nand on the type system.\n\nReinforcement Learning\n\nReinforcement Learning is a sub-field of machine learning where an agent gets\nto interact with an environment by observing the state of the world, deciding\non an action and submitting it to the environment. The environment then updates\nits state according to this action and returns some new observation as well as\na potential reward. The goal for the agent is to maximize the rewards that it gets by\ninteracting with the environment.\n\nDeep Reinforcement Learning combines the modern Deep Learning approach to\nReinforcement Learning. One of the early algorithms in this domain is Deepmind’s\nDeep Q-Learning algorithm which was used to\nmaster a wide range of Atari 2600 games.\nIn this context the observations are the values taken by the pixels from the\nscreen (with a resolution of 160x192); the actions correspond to pressing the\ndifferent buttons, e.g. left, right, or fire, and the rewards come from the\nscore of the game.\n\nIn order to interact with Atari games we rely on the\nOpenAI gym environment, which makes it very easy to\ntry different Atari games or various other tasks. The OCaml signature for the\nenvironment is as simple as one would expect:\n\nmodule Env : sig\n  type t\n\n  (** [create ()] sets up a new environment. *) \n  val create : unit -&gt; t\n\n  (** [reset t] creates a new game session and returns the first observation. *)\n  val reset : t -&gt; observation\n\n  (** [step t ~action] applies [action] to the environment and returns the\n      new observation, the reward received by the agent, and a boolean\n      set to true if the game has finished. *)\n  val step : t -&gt; action:int -&gt; observation * float * bool\nend\n\n\n\nQ-values\n\nConsider the function  which, given a state  and an\naction , returns the total reward that an agent playing perfectly would\nget starting from state  if it used action .\nIf  could be computed exactly it would be straightforward to build\na perfect agent by selecting the action that maximizes .\nMost of the time, this function is not known; in a nutshell, Q-learning is the\nprocess of approximating it.\nDeep Q-Networks (DQN) build such an approximation using (deep) neural networks.\n\nWe know that  must satisfy the Bellman equation which states\nthat  is the sum of the reward  received when\nperforming action  and the Q-value from the next state  (as\nreturned by the environment) using the action leading to the highest\n value.\n\n\n\n is the discount factor, a constant between 0 and 1 representing\nthat future rewards are to be discounted, i.e. getting a reward now is better\nthan later. A typical value is .\n\nMore generally, any agent defined by a function  which maps\neach state  to the action performed on this state has an\nassociated Q-value function that satisfies a similar equation:\n\n\n\nOur agent uses the action that maximizes its internal approximation of\n. So to find a good Q-value approximation we look for a function that\napproximately satisfies the same Bellman equation as . In particular, we\nwill use a learning algorithm that attempts to minimise the loss\n\n\n\nAs our Q-value approximation improves, the policy that it implies for\nthe agent accumulates more rewards.\n\nEach optimization step runs the following ocaml-torch snippet, the model we use\nis called q_model. As we consider that our agent takes the optimal action we\ntake the maximum of the expected  values for the next step.\n\nlet qvalues = Layer.apply q_model state in\nlet next_qvalues = Layer.apply q_model next_state |&gt; Tensor.max in\nlet expected_qvalues = Tensor.(rewards + f gamma * next_qvalues) in\n(* minimize the mean squared error between [qvalues] and [expected_qvalues] *)\nlet loss = Tensor.mse_loss qvalues expected_qvalues in\nOptimizer.backward_step t.optimizer ~loss;\n\n\n\nWhen it comes to action selection we use an -greedy\npolicy.  Rather than always taking the best action according to the\ncurrent  function there is some small probability  at\neach step of taking a random action instead. This helps the agent\ndiscovering new states and so gives more weight to exploration in the\nexploration vs exploitation\ntradeoff.  The\nvalue of  decays over time. This corresponds to the following code\nsnippet:\n\nlet action t state ~total_frames =\n  let epsilon = Float.max 0.02 (0.5 -. Float.of_int total_frames /. 1_000_000.) in\n  if Float.(&lt;) epsilon (Random.float 1.)\n  then begin\n    let qvalues = Layer.apply q_model in\n    Tensor.argmax qvalues |&gt; Tensor.to_int0_exn\n  end else Random.int t.actions\n\n\n\nModeling Q-values\n\nAn observation returned by our environment consists of the pixel values for the\nwhole screen. The color information is not very relevant in Pong so we \nconvert the frame to grayscale and downscale it to 80x80. Seeing a single frame\nis not enough to know about the ball direction so we consider the\ndifference between two consecutive frames. This is implemented in a\npre-processing function with the following signature:\n\nval preprocess : Tensor.t -&gt; Tensor.t\n\n\n\nAs Andrej Karpathy noted in his blog post Pong from\nPixels, there is no need\nto use convolutions. A simple two layer model is enough do the\ntrick. The input to the model is an 80x80 image that we flatten before\napplying the first linear layer.\n\nlet model vs actions =\n  let linear1 = Layer.linear vs ~input_dim:(80 * 80) 200 in\n  let linear2 = Layer.linear vs ~input_dim:200 actions in\n  Layer.of_fn (fun xs -&gt;\n    Tensor.flatten xs\n    |&gt; Layer.apply linear1\n    |&gt; Tensor.relu\n    |&gt; Layer.apply linear2)\n\n\n\nAn issue with Q-learning is that the states that the agent observes in\ntwo consecutive frames are very correlated. Learning only on the most\nrecent data could easily get the agent to ‘forget’ about the more\ndistant past. To mitigate this we use a replay memory to store a large\namount of previous transitions. Each transition is composed of a\nstate, an action, the returned reward and the subsequent state. On a\ntraining step we extract a random batch of transitions from this\nmemory and hence hopefully provide less correlated data. This\nprocess is called experience replay.\n\nThe replay memory is implemented by a ring buffer and has the following\nsignature. The main functions are create, push to add a transition to the\nmemory and sample to get a random batch of elements from the current memory.\n\nmodule Replay_memory : sig\n  type t\n  val create : capacity:int -&gt; t\n  val push : t -&gt; transition -&gt; unit\n  val sample : t -&gt; batch_size:int -&gt; transition list\nend\n\n\n\nThe training loop then processes one game (or episode) at a time. For each of these\nan internal loop runs until the game is over by:\n\n\n  Getting the agent action using the -greedy policy described previously.\n  Giving the action to the environment, and getting back the reward and next observation.\n  Pushing this transition to the replay memory.\n  Extracting a random batch from the replay memory and using it to optimize\nthe approximated  function.\n\n\nThis leads to the following OCaml code:\n\nfor episode_idx = 1 to total_episodes do\n  let rec loop state =\n    let action = DqnAgent.action agent state in\n    let next_state, reward, is_done = Env.step env ~action in\n    let next_state = preprocess next_state in\n    (* Add the transition to the replay memory. *)\n    Replay_memory.push memory { state; action; next_state; reward; is_done };\n    (* Perform an optimization step using a random batch from the replay memory. *)\n    let batch = Replay_memory.sample memory ~batch_size in\n    DqnAgent.learning_step agent batch;\n    if not is_done then loop next_state\n  in\n  loop (Env.reset env |&gt; preprocess)\ndone\n\n\n\nMastering Pong\n\nIn Pong the player has to bounce a ball back at its opponent. If it\nmisses the opponent gets a point, if the opponent misses the player\ngets a point. These two events respectively corresponds to a reward of\n-1 and +1. Each match consists of 21 points. We sum the rewards that\nthe agent receives to get a score that can range from -21 to 21.\n\nThe following two curves show the evolution of the scores achieved by our DQN\nagent in two different training sessions, showing how noisy training is. In\nboth cases it takes a bit more than 100 matches for the agent to manage to\nscore consistently but after that it quickly improves and gets far better than\nthe game hard-coded agent.\n\n\n\nWe can also visualize the agent playing a match.\n\n\n  \n\n\nThe source code for the pong example can be found in the\nGitHub ocaml-torch repo.\n\nThe actual implementation is a bit more involved than what has been described\nso far. Rather than using a single model for our approximated function ,\nwe use an additional target model . The right hand side of the Bellman\nequation uses  and we only update  after some fixed number of\nupdates by copying the weights from  whereas  gets continuously\nupdated. This target Q-network trick is also used in the\noriginal DQN paper.\n\nPlaying Breakout\n\nLet’s try a more challenging Atari game: Breakout. In order to get DQN to work\non this we used the following tweaks.\n\n\n  The agent’s inputs are still downsampled grayscale images - however this time\nthe agent is given the last 4 frames so that it can infer the movement.\n  The model uses a Convolutional Neural Network.\n  We use episodic life: each time it loses a life the agent is told\nthat the game is over. This helps the agent more quickly figure out\nhow bad it is to lose a life.\n  Rather than the mean square error for the Bellman equation based loss, we\nuse the more robust Huber loss.\nThis has the same effect as clipping the gradients of the loss with\nrespect to the model to 1.\n\n\nThe resulting algorithm takes far longer to train on this game. The\nfollowing plot shows the training score evolution as a function of the\nnumber of frames that have been played (an episode lasts for ~150 to\n~2000 frames). Training is very noisy, so the curve shows the score\naveraged over the last 100 episodes.\n\n\n\nAfter training for 10 million frames the DQN agent almost manages to clear the\nscreen on its best runs:\n\n\n  \n\n\nThe source code can again be found on\nGitHub.\n\nImproving Type Safety\nUsing OCaml to implement DQN is a nice exercise, now let’s see what benefits\nthe OCaml type system could bring. For Pong we used a pre-processing function\nthat converts a tensor containing an RGB image of the screen to a lower\nresolution tensor containing the difference between two consecutive grayscale\nframes. We also used a two layer model that takes a pre-processed image and\nreturns the Q-values.\nThese functions have the following signatures:\n\nval preprocess : Tensor.t -&gt; Tensor.t\nval model : Tensor.t -&gt; Tensor.t\n\n\n\nThese signatures don’t provide much safety: the number of dimensions for the\ntensors is not even specified. The following snippet would compile without any\ntype error despite the pre-processing step being omitted. Hopefully a dimension\nmismatch could help us catch this at runtime but we would rather enforce this\nin a static way.\n\nlet obs, reward, is_done = Env.step env ~action in\n(* No pre-processing has been applied! *)\nlet q_values = model obs in\n\n\n\nThere have been some interesting discussions recently on how to avoid\nthis kind of issue, e.g. using Tensor with named\ndimensions. Here we\nwill take a less generic approach. We introduce new types that\nabstract the Tensor.t type, have some specified number of\ndimensions and also provide more information on what the dimension\nrepresents. We also introduce a couple of empty types to represent the\nvarious dimension types.\n\ntype _ tensor1\ntype (_, _) tensor2\ntype (_, _, _) tensor3\ntype (_, _, _, _) tensor4\n\ntype n (* batch dimension *)\ntype c (* channel dimension for images *)\ntype h (* height dimension for images *)\ntype w (* width dimension for images *)\ntype a (* action dimension - for q-values *)\n\n\n\nThe new types can then be used to reflect the dimensions that we expect for the\npre-processing and model functions.\n\nval preprocess : (n, c, h, w) tensor4 -&gt; (n, h, w) tensor3\nval model : (n, h, w) tensor3 -&gt; (n, a) tensor2\n\n\n\nNow the code snippet where we forgot the pre-processing would not compile anymore.\nWe can also give proper types to some generic functions, e.g. require that the\ndimensions are the same for a sum or remove the last dimension when taking the\nmaximum over this last dimension.\n\nval add3 : ('a, 'b, 'c) tensor3 -&gt; ('a, 'b, 'c) tensor3 -&gt; ('a, 'b, 'c) tensor3\nval max_over_last_dim3 : ('a, 'b, 'c) tensor3 -&gt; ('a, 'b) tensor2\n\n\n\nOur encoding was a bit crude so we had to create specific functions depending\non the number of dimensions. In the future, this is the kind of thing that modular\nimplicits will help with.\n",
        "url"      : "https://blog.janestreet.com/playing-atari-games-with-ocaml-and-deep-rl/",
        "image"    : "https://blog.janestreet.com/playing-atari-games-with-ocaml-and-deep-rl/atari.jpg",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "L2 Regularization and Batch Norm",
        "date"     : "January 29, 2019",
        "authorId" : "dwu",
        "author"   : "David Wu",
        "tags"     : [],
        "minsToRead" : 24,
        "content"  : "This blog post is about an interesting detail about machine learning\nthat I came across as a researcher at Jane Street - that of the \ninteraction between L2 regularization, also known as\nweight decay, and batch normalization.\n\nIn particular, when used together with batch normalization in a convolutional neural net with\ntypical architectures, an L2 objective penalty no longer has its original regularizing effect. \nInstead it becomes essentially equivalent to an adaptive adjustment of the learning rate!\n\nThis and similar interactions are already part of the awareness in the wider ML literature, \nfor example in Laarhoven or Hoffer et al..\nBut from my experience at conferences and talking to other researchers, I’ve found\nit to be surprisingly easy to forget or overlook, particularly considering how commonly both \nbatch norm and weight decay are used.\n\nFor this blog post, we’ll assume that model fitting is\ndone via stochastic gradient descent. With ADAM or other alternative\noptimizers, the analysis presented here does not necessarily apply\n(particularly as the notions of weight decay versus an L2 objective\npenalty now become nonequivalent, as explored by Loshchilov and Hutter), \nand the behavior will be potentially different.\n\nL2 Regularization / Weight Decay\n\nTo recap, L2 regularization is a technique where the sum of squared\nparameters, or weights, of a model (multiplied by some coefficient) is added\ninto the loss function as a penalty term to be minimized.\n\nLet  be the collection of model weights, and  be any\nmini-batch, and  be the learning rate, and  be the\ncurrent error we are minimizing with respect to the data. With L2 regularization our \noverall loss function will be of the form:\n\n\n\nDuring gradient descent, our update step will look like:\n\n\n\nSo the effect of the L2 penalty term  is that on each optimization step \nin addition to stepping based on a gradient to better fit the training data, all model weights also \ndecay proportionally towards zero by a small factor of . \nThis is why this technique is also known as “weight decay”.\n\nPurpose/Intuition\n\nUsually, the motivation for L2 regularization is to reduce overfitting.\n\nOne way to intuit this is that weight decay will continuously squeeze model weights to be too small, \nincreasing error on the training data. However, important weights that capture common regularities in the \ndata will consistently recover back up on future steps, re-reducing that error. Weight values that are merely due to \nnoise in a particular batch or to only small numbers of data samples and that do not \naffect the error much will not recover so readily. In this way, the final model weights\n(hopefully) fit the more of the broader regularities in the data and less of the noise. \nThis is how an L2 penalty regularizes the model.\n\nAlternatively, a Bayesian view would be that the penalty  imposes a prior about \nthe “complexity” of the model. In function approximation, models that \nprecisely wiggle to fit every data point in a noisy setting tend to require very large \nweights to generate the necessary sharp kinks to do so and are less likely to generalize. \nModels with smaller weights will generally be smoother and more likely to generalize. \nBy directly penalizing large weights, we favor smoother and less “complex” models.\n\nBatch Normalization\n\nBatch normalization is a technique where layers are inserted into typically\na convolutional neural net that normalize the mean and scale of the per-channel activations\nof the previous layer. Depending on the architecture, this is usually somewhere between each nonlinear activation \nfunction and prior convolutional layers (He et al.).\n\nLet  be the output tensor of a batch norm layer, and  be the output tensor\nof the layer that feeds into it. As before, let  be the model weights of the previous (linear) \nlayer. Then we have:\n\n\n\nwhere:\n\n\n\n\n\nand  is a typically negligible constant that ensures no division by zero. For notation\nwe use  here to denote indices into the relevant vectors and tensors, and we also have  as an argument\neverywhere to emphasize that we are considering everything as a function of , as will be useful below.\n\nSo for each channel ,  will be the same as  except shifted and rescaled to have mean\nzero and standard deviation one. Typically one or two learnable parameters are further added, but we omit those for simplicity as they do not affect the analysis in this post.\n\nPurpose/Intuition\n\nBatch normalization was introduced by a 2015 paper (Ioffe and Szegedy), with\nthe idea of stabilizing the distribution of layer activations over the course of training, reducing\nthe instability of deeper neural nets to saturate or diverge. But exactly why it helps appears to still be a \ntopic of research (for example, see Santurkar et al. or Bjorck et al.).\n\nEmpirically for convolutional neural nets on some (but not all) problems, \nbatch normalization stabilizes and accelerates the training while reducing the need to \ntune a variety of other hyperparameters to achieve reasonable performance.\n\nThe key property that is relevant for this post is that batch norm layers make the neural \nnet output approximately invariant to the scale of the activations of the previous layers. Any such scaling \nwill simply be normalized away, except for the tiny effect of the  in the denominator.\n\nWhat Happens When Both Are Used Together\n\nWhat happens when both batch norm and an L2 objective penalty are used\ntogether in a convolutional neural net? To first order, the weight decay from the \nL2 penalty on a convolutional layer no longer has an influence on the output of the neural net!\n\nWith a little thought, this should not be surprising. Since batch norm \nmakes the output invariant to the scale of previous activations, and the scale of previous activations\nis linearly related to the scale of the model weights, the output will now be invariant to weight decay’s \nscaling of those weights.\n\nFormally, let  be the model weights for a convolutional layer, let  be the \noutput tensor for that layer. Assume that the  feeds into a batch norm layer, and \nlet  be the output of that batch norm layer, viewed as a function of .\n\nSuppose that as a result of an L2 penalty term or direct weight decay, we scale  by a factor .\n\nSince convolution is a linear operation, scaling the weight matrix of a\nconvolution simply scales the output. Also straightforwardly from their earlier definitions,  and  also scale linearly with . Therefore:\n\n\n\n\n\n\n\nThen the new output of the batch norm layer is:\n\n\n\nSo a scaling by  approximately has no effect on the output, as expected. Note also that this property does not depend on the batch size. (except perhaps if the noise in extremely tiny batch sizes makes it slightly more common for a layer to be tiny and  to matter).\n\nNo More L2 Regularization Mechanism\n\nWhat happens when we try to use an L2 objective penalty term with batch normalization present?\n\nSince the neural net’s output is invariant to the scale of , the mechanism by which the weight decay would normally regularize the neural net is broken!\n\nWithout batch norm, important weights should experience gradients to restore their magnitudes countering earlier weight decays, whereas weights fitting only noise would on average remain decayed. But with batch norm, all weights will be “equally happy” at the decayed value  as at the original value. Since it is a proportional decay, the batch norm layer will automatically “undo” the decay and there will be no gradient to preferentially increase the magnitude of the important entries within  relative to the less important ones.\n\nOr more formally, it’s pretty easy to show that if a given function  is invariant to multiplicative scalings of , then the direction of the gradient  must also be invariant to multiplicative scalings of . In other words, weight decay’s scaling of the weights cannot directly alter the direction of future gradient descent steps to favor any components of  over any others (although it could alter the size of the overall steps).\n\nThe Bayesian perspective is another way to intuit why there should be no regularization effect now. An L2 penalty term normally acts as a prior favoring models with lower “complexity” by favoring models with smaller weights. But when the model is invariant to the scale of the weights, an L2 penalty no longer accomplishes this. With batch norm, models with smaller weights are no more or less “complex” than ones with larger weights, since rescaling the weights of a model produces an essentially equivalent model.\n\nNew Effect on Gradient Scale and Learning Rate\nDoes that mean L2 regularization is pointless with batch norm present? No - actually it takes on a major new role in controlling the effective learning rate of the model during training. Here’s how:\n\nWithout batch norm, the weights of a well-behaving neural net usually don’t grow arbitrarily, since an arbitrary scaling of all the weights will almost certainly worsen the data loss. In my experience, it’s pretty common for weights to remain near same order of magnitude that they were initialized at.\n\nBut with batch norm, they are unconstrained since an increase in the overall magnitude of the weights in any layer will simply result in the subsequent batch norm layer scaling all the activations down again. So the weights can grow significantly over time, and absent any controlling force in practice they very much do. As we will see, this has a major effect on the magnitude of the gradients.\n\nEffect of Scale on Gradients\n\nConsider as before what happens when we scale the model weights  of a convolutional layer by a factor , when there is a subsequent batch norm layer. What happens to the gradient of the loss function on the data with respect to ?\n\nIntuitively, the gradients should vary inversely with . For example, if a given absolute step  changes the loss by some amount , then doubling all the weights means that after batch norm cuts in half all the activations, the same absolute-sized step  will only have half as large an effect on the activations, so  should be halved.\n\nMathematically, this translates to the following (non-rigorous) derivation. Heuristically using several times the fact that  for any reasonable-scale , and letting  be any particular entry within :\n\n\n\n(it’s possible to be more rigorous about the above and about how much  affects the quality of the approximation , but for simplicity we avoid doing so here).\n\nSo as expected, scaling  by a factor of  causes the gradients to scale by a factor of . Additionally, since with batch norm what matters is the scale of gradient steps relative to the existing magnitude of , and  itself is still  times larger, this effectively scales the learning rate of  by a factor of .\n\nConsequences for Learning Rate\n\nWith batch norm removing any inherent constraint on the scale of , absent any other constraint, we would expect  to naturally to grow in magnitude over time through stochastic gradient descent. This is because a random walk’s distance from the origin grows in magnitude over time with very high probability (this is true even when batch normalization causes every gradient step in parameter space to have no locally inward or outward component, since we are taking finite-sized steps and a finite-sized step tangent to the surface of a sphere will end up slightly further outside of that sphere).\n\nThen by the  scaling of the gradient, this will in effect cause the learning rate to greatly decay over time. As  grows, the relative step sizes will shrink quadratically.\n\nSo without an L2 penalty or other constraint on weight scale, introducing batch norm will introduce a large decay in the effective learning rate over time. But an L2 penalty counters this.\n\nWith an L2 penalty term to provide weight decay, the scale of  will be bounded. If it grows too large, the multiplicative decay will easily overwhelm any outward motion due to random walking. In the limit of training for a very long time at a fixed nominal learning rate, one would expect that the scale of  would tend toward an equilibrium level where the expansion due to random walking average precisely balanced out the weight decay. This prevents the gradient and therefore the effective learning rate from decaying over time.\n\nSummary\n\nSo to a first-order approximation, once you are using batch normalization in a neural net, an L2 objective penalty term or weight decay no longer contribute in any direct manner to the regularization of layers that precede a batch norm layer. Instead, they take on a new role as the unique control that prevents the effective learning rate from decaying over time.\n\nThis could of course itself result in better regularization of the final neural net, as maintaining a higher learning rate for longer might result in a broader and better-generalizing optimium. But this would be a result of the dynamics of the higher effective learning rate, rather than the L2 objective penalty directly penalizing worse models.\n\nOf course, this analysis does not hold for any layers in a neural net that occur after all batch normalization layers, for example typically the final fully-connected layers in common architectures. In those layers, obviously the normal regularization mechanism applies. Other variations on architecture might also affect this analysis. And as mentioned near the start of this post, if you are using an optimizer other than stochastic gradient descent (or stochastic gradient decent with momentum - the analysis is very similar), things might also be a little different.\n\nExperiment\n\nAs a demonstration of the above, in theory we should be able to closely replicate the effect of an L2 objective penalty in a batch-normalizing neural net purely by adjusting the learning rate in the various layers to perform the same learning-rate scaling that the weight decay would have resulted in. And we can do exactly that!\n\nUsing TensorFlow version 1.11 we train the ResNet-20 model (version 1, no preactivation) on CIFAR-10 based on code from the official TensorFlow model examples repo. Conveniently, the official example model provided already uses both batch normalization and an L2 objective penalty (with a hardcoded coefficient of 0.0002).\n\nAs a baseline, we train for 50 epochs with a learning rate of 0.1, then 50 epochs with 0.01, then 50 epochs with 0.001, leaving other hyperparameters untouched from defaults. Additionally, we train a second model where we remove all convolutional layers from the L2 objective penalty (but not the final layers of the neural net, since all convolutional layers are followed by a batch normalization layer but the final “head” layers are not).\n\nHere is a plot of the test set prediction accuracy of the resulting models over the course of training:\n\n\n\nThe model without the L2 penalty (“NoConvL2”) ended up worse than the baseline, stabilizing around 89.5% rather than 91% accuracy. If the theory is correct that L2 in the presence of batch norm functions as a learning-rate scaling rather than a direct regularizer, then this worsened accuracy should be due to something that resmbles a too-quick learning rate drop rather than a similar-to-baseline training curve with merely somewhat worse overfitting. Without the L2 penalty to keep the scale of the weights contained, they should grow too large over time, causing the gradient to decay, effectively acting as a too-rapid learning rate decrease.\n\nThis is borne out by the following plot of the sum of squared weights in all convolutional layers for the two runs:\n\n\n\nAs well as by the magnitude of the average optimizer step on convolutional layers, divided by the norms of the weights for those layers:\n\n\n\n\n(sampled every quarter epoch and slightly smoothed with a quarter-epoch halflife)\n\n\n\n\nAs expected, without the L2 penalty the weights grew much faster, causing the relative step size to decay, dropping the speed of learning far too fast for the model to reach as good of a fit.\n\n(As an interesting note, it turns out that at least in these runs, the worse fit arguably manifests both as more underfitting and more overfitting! Drilling down reveals the NoConvL2 run had about an 0.024 logit or 7% larger final difference between training and test losses, suggesting worse overfitting, but the training loss itself was about 0.025 logits worse as well, suggesting some underfitting too.)\n\nNow for the fun part: theoretically, we should be able to restore the original training behavior without adding back the L2 penalty, by manually adjusting the learning rate for the convolutional layers to increase over time at precisely the right rate to counteract the weight growth and reproduce the learning rate of the baseline.\n\nAnd since with batch norm there should be no meaningful direct regularization effect from the L2 penalty that we will need to reproduce, theoretically we will not need to add any additional regularization to achieve the baseline accuracy again.\n\nLet’s try this. Since the effective step size on the convolutional layers would diminish over time inversely with the squared magnitude of the weights, we compute the squared magnitude of the weights and scale the gradients to compensate. A crude snippet of TensorFlow code in Python to do this looks roughly like:\n\nconv2d_sqsum = tf.add_n([\n  tf.reduce_sum(tf.square(tf.cast(v, tf.float32))) for v in tf.trainable_variables()\n  if (\"conv2d\" in v.name)\n])\ninitial_conv2d_sqsum = 800.0 # empirical initial value of conv2d_sqsum for Resnet-20\n\n# We will multiply gradients by this:\nconv_lr_factor = conv2d_sqsum / initial_conv2d_sqsum\n\n\n\nAdditionally, the average squared magnitude of the convolutional weights in the baseline run itself was not constant! So to replicate the baseline training, we also need to multiply the gradients by the inverse of that as well. We observe what this actually was in our baseline run, and then crudely approximate it with a piecewise linear function for the purposes of implementing it in TensorFlow, which is plotted below:\n\n\n\nGiving us this final hacky bit of code that we insert into the TensorFlow example code:\n\nconv2d_sqsum = tf.add_n([\n   tf.reduce_sum(tf.square(tf.cast(v, tf.float32))) for v in tf.trainable_variables()\n   if (\"conv2d\" in v.name)\n])\ninitial_conv2d_sqsum = 800.0 # empirical initial value of conv2d_sqsum for Resnet-20\n...\nconv_lr_factor = tf.where(epoch &lt; 10.0, (1.0 - 0.05 * epoch),           \n                 tf.where(epoch &lt; 50.0, (0.5 - 0.0025 * (epoch-10.0)),  \n                 tf.where(epoch &lt; 100.0,(0.4 + 0.006 * (epoch-50.0)),   \n                                        (0.7 + 0.001 * (epoch-100.0)))))\nconv_lr_factor *= conv2d_sqsum / initial_conv2d_sqsum\n\ngrad_vars = optimizer.compute_gradients(loss)\nscaled_grad_vars = []\nfor (g,v) in grad_vars:\n    if \"conv2d\" in v.name:\n        scaled_grad_vars.append((conv_lr_factor*g, v))\n    else:\n      scaled_grad_vars.append((g,v))\n\ngrad_vars = scaled_grad_vars\n...\n\n\n\nHere’s the resulting run attempting to replace the L2 objective term with this equivalent scaling:\n\n\n\nNot bad! The new run does have the same final accuracy as the baseline. However the accuracy during the first learning rate regime is now a lot worse on average. Why is that?\n\nThis turns out to be because although we’ve closely replicated the training, at inference time the batch normalization layers use a moving average of statistics from training. During the first learning rate regime, our replicated training has model weights growing exponentially over time instead of maintaining a similar magnitude throughout because rather than using an L2 penalty to bound their scale, we’re simply adjusting the learning rate to be even larger to keep up. So the training modulo the scale of the weights is the same, but at inference time the batch norm moving averages will always be too small, as they can’t keep up with the exponential growth.\n\nIf we wanted, we could simply shrink the window for the moving averages a little to help them keep up, (by changing the hardcoded “_BATCH_NORM_DECAY” constant in the TensorFlow example from 0.997 to 0.98). This has absolutely no effect on the training, but should improve the inference-time accuracy within the first learning rate regime:\n\n\n\nAnd indeed it does. In fact, it looks like we’ve overshot slightly - presumably in the baseline run the batch norm moving averages were already having difficulty keeping up due to the high learning rate alone, so with a shorter moving average window our purple test accuracy line is actually a little higher than the baseline orange in the first 50 epochs.\n\nHere’s a plot of the average relative step size for the first three training runs together again, which shows that indeed our manual learning rate scaling has indeed replicated the step-size behavior of the original training:\n\n\n\n\n(sampled every quarter epoch and slightly smoothed with a quarter-epoch halflife)\n\n\n\n\nAnd here’s a plot of the magnitude of the convolutional weights for those runs, this time on a log scale:\n\n\n\nAs expected, the weights grow exponentially to some quite extreme values, far larger than baseline! This shows that the cumulative effect of weight decay over time on a batch-normalizing neural net, when viewed instead as a learning rate adjustment, can be massive.\n\nIn summary, an L2 penalty or weight decay on any layers preceding batch normalization layers, rather than functioning as a direct regularizer preventing overfitting of the layer weights, instead takes on a role as the sole control on the weight scale of that layer. This prevents the gradients and therefore the “effective” learning rate for that layer from decaying over time, making weight decay essentially equivalent to a form of adaptive learning rate scaling for those layers.\n\n\nEdit 2019-01-30: this post originally characterized the result of removing L2 and dropping the effective learning rate too quickly as just underfitting, but a reader pointed out (thanks!) that this also often gives overfitting as well since one can get final models with worse generalization even when the final training loss is similar.\n\n\n",
        "url"      : "https://blog.janestreet.com/l2-regularization-and-batch-norm/",
        "image"    : "https://blog.janestreet.com/l2-regularization-and-batch-norm/l2-batch-norm_19b.png",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "A tutorial for building web applications with Incr_dom",
        "date"     : "January 15, 2019",
        "authorId" : "jsomers",
        "author"   : "James Somers",
        "tags"     : [],
        "minsToRead" : 0,
        "content"  : "At Jane Street, our web UIs are built on top of an in-house framework\ncalled Incr_dom, modeled in\npart on React’s virtual\nDOM. Rendering different\nviews efficiently in response to changes made to a shared model is a\nquintessentially incremental computation—so it should be no surprise\nthat Incr_dom is built on top of\nIncremental.\n\nTo date, the documentation for Incr_dom has been limited to a few\nblog\nposts,\na talk and\nthe API\ndocs. If\nyou’ve wanted to build an app with it, you’ve pretty much been on your\nown.\n\nBut we’re happy to announce that we’ve just released a couple of\nexample-driven\ntutorials\nthat walk through the basics of building an Incr_dom app via a\nteardown of sample applications. You’ll learn how to quickly get\nstarted with the framework and how to take advantage of some of its\nmore sophisticated features.\n\nYou can find the tutorials on Github here: https://github.com/janestreet/incr_dom/blob/master/example/README.org.\n",
        "url"      : "https://blog.janestreet.com/a-tutorial-for-building-web-applications-with-incrdom/",
        "image"    : "https://blog.janestreet.com/a-tutorial-for-building-web-applications-with-incrdom/incr_dom.png",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "How to shuffle a big dataset",
        "date"     : "September 26, 2018",
        "authorId" : "chardin",
        "author"   : "Chris Hardin",
        "tags"     : [],
        "minsToRead" : 13,
        "content"  : "At Jane Street, we often work with data that has a very low\nsignal-to-noise ratio, but fortunately we also have a lot of data.\nWhere practitioners in many fields might be accustomed to\nhaving tens or hundreds of thousands of correctly labeled\nexamples, some of our problems are more like having a billion training\nexamples whose labels have only a slight tendency to be correct.\nThese large datasets present a number of interesting engineering\nchallenges.  The one we address here: How do you shuffle a really\nlarge dataset?  (If you’re not familiar with why one might need this,\njump to the section Why shuffle below.)\n\nFor a dataset x0 , . . . , xn - 1 that fits in RAM, you can shuffle using something like\nFisher–Yates:\nfor i = 0, ..., n - 2 do\n  swap x[i] and x[j], where j is a random draw from {i, ..., n - 1}\n\n\nBut what if your dataset doesn’t fit in RAM?\n\nI will present the algorithm I use for shuffling large datasets.  It\nisn’t novel, and one can find multiple\ninstances of\npeople\nreinventing\nit or something similar (and in essence it descends from\nRao).  However, I don’t know\nof anywhere that states the algorithm, shows why it’s correct, and\ngets into the particular practical issues we address below.  Also,\nwhen I first encountered this problem and searched online for an\nanswer, I didn’t find any of the good examples above, just lots of bad\nideas, so hopefully this post will improve the odds for the next\nperson.\n\nTo be clear, this is not some minor performance hack.  For large\ndatasets, it makes the difference between feasible and infeasible.\n(See appendix for a more quantitative comparison.)\n\nA 2-pass shuffle algorithm\nSuppose we have data\nx0 , . . . , xn - 1. \nChoose an M sufficiently large that a set of n/M points can be shuffled\nin RAM using something like Fisher–Yates, but small enough that you can have\nM open files for writing (with decent buffering). Create M “piles”\np0 , . . . , pM - 1\nthat we can write data to.  The mental model of a “pile” here is that it’s a\nfile you can append to, but in practice\nyou might, say, have several piles exist as datasets in the same HDF5\nfile.  The first pass of the algorithm is to split the data into these\nM piles, and the second pass shuffles each pile and appends it to\nthe final result.\n-- First pass\ncreate empty piles p[0], ..., p[M - 1]\nfor i = 0, ..., n - 1 do\n  j := uniform random draw from {0, ..., M - 1}\n  append x[i] to pile p[j]\n\n-- Second pass (perhaps done lazily)\nfor j = 0, ..., M - 1 do\n  shuffle p[j] in RAM with Fisher-Yates or whatever is convenient\n  append p[j] to output file\n\n\n\n\n\n\nExample of a shuffle: We start with unshuffled data (top); the first\npass leaves M=6 piles (middle); the second pass yields shuffled data (bottom).\n\n\n\n\nAssuming you have enough memory to satisfy the above constraint on\nM and assuming that drawing a random number is O(1), this is a\nlinear time algorithm; the constant factor is dominated by having to\nread and write each data point twice in external storage (but the\nreading/writing can be done in blocks rather than one point at a time).\nSince the reading and writing is stream-oriented, the algorithm still\nworks for data with variable record length.\n\nTo see that the 2-pass shuffle yields an unbiased random permutation,\nconsider another algorithm already known to be correct: draw\nU0 , . . . , Un - 1 ~ Uniform(0,1), associate xi with Ui, and\nsort by Ui; this yields an unbiased permutation.  Our algorithm\nabove can be seen to be equivalent to this: for M=1000, the choice\nof pile is like radix sorting on the first 3 digits of Ui, and then\nshuffling within each pile is like sorting on the remaining digits.\n\nDealing with oversized piles\nEven if the expected pile size would be\nsmall enough to shuffle in RAM, there is some chance of getting an\noversized pile that is too large to shuffle in RAM.  You can make\nthe probability of getting an oversized pile very small: if expected\npile size is s, the stdev is slightly under\n√s, so you can just arrange for, say,\ns + 6√s\nto be a size that you can still shuffle in RAM.\nEven with M=1000, the chance\nthat some pile will be larger than expected by 6 stdevs is about\n10−6.\n(This 6√s business is just\na formality.  In practice, you just leave yourself what feels like a sufficient amount\nof headroom, and if you get an oversized pile, it’s overwhelmingly likely that\nyou overestimated how many points you could fit in memory rather than getting unlucky,\nand you try again with smaller pile size.)\n\nIn the rare case that you end up with an oversized pile, you could recursively\napply the algorithm to the oversized pile, but it’s also okay just to\nstart over.  Because the probability of having to restart is\nsmall, the expected runtime is only slightly increased.\nYou might worry that starting over would introduce some bias into the shuffle,\nbut—surprisingly,\nperhaps—it doesn’t, because the tuple of pile sizes that results from the first pass\nis independent of the permutation that is generated.\n(Consider the above way of thinking of the algorithm as\nassociating each point with some Ui and then sorting; if I tell\nyou how many of the Ui happened to fall in certain intervals, I\nstill haven’t given you any information about the relative ordering\namong the Ui.)\n\n\nA similar consideration applies if the way you are storing your data\nmakes it necessary or advantageous to preallocate the storage\nfor each pile: you preallocate\ns + 6√s\nfor each pile, on average waste\n6√s\nper pile, and very rarely have to restart if you exceed\nthe storage you had preallocated.\n\nParallelizing, and other practical considerations\nAs a practical matter, with very large data sets, the input is often\nbroken across several files rather than being in a single file, and it would be\ndesirable for the result of the shuffle to be broken across several\nfiles as well.  The above algorithm adapts naturally to this context.\n\n\n  \n    Suppose the input is spread across files\nX0 , . . . , XK - 1.  We do the\nfirst pass for each of these files in parallel, leaving many sets of\npiles\npk0 , . . . , pkM -\n1\nfor\nk = 0 , . . . , K - 1.\n  \n  \n    For j = 0 , . . . , M - 1, combine\np0j , . . . , pK - 1j\ninto pj.\n  \n  \n    Proceed with second pass as above.\n  \n\n\nCommonly, the data you are trying to shuffle was the output of some\npreprocessing step.  The first pass can be integrated into the\npreprocessing, so that the extra cost incurred by the first pass is\nnear zero: during preprocessing, where you would have written\npreprocessed data to one file, you instead write it to many piles.\n\nAlso, in practice, it can be handy to have the resulting chunks be\nsmall enough that they can be shuffled in RAM while also\ntraining your model.  Then, the second pass is done lazily: You\nonly shuffle the piles as they are loaded for training.  This is often\na net win, depending on how many times you are going to consume the\ndata without re-shuffling.  (Fancier still, if the piles are small\nenough that you can fit 2 in memory at the same time, you can have\na better input pipeline: while you are training on one pile, you start\nloading and shuffling the next one.)\n\nLeaving piles unshuffled also allows for another trick pointed\nout by my colleague David Wu: Suppose new\ndata is arriving at a roughly constant rate, and you want to maintain a moving\nwindow of length Y years.  Think of each pile as a circular buffer,\nwith its contents in chronological order.  As new data comes in, when\nyou write to a pile, you remove outdated data and append the new data.\nIn this way you can incrementally maintain a shuffled copy of the last\nY years of data.  (Okay, it’s only a half-shuffled copy, but the\nremaining work is easy to do when you load each pile.)\n\n\n\nLeaving the data in many piles, rather than combining into a single\nmonolithic output, also allows you to get imperfect (but for many\npurposes good enough) reshuffles by permuting the order in which you\nload piles (and shuffling within each pile when you load it).\n\n\nWhy shuffle\nWhen training neural nets by stochastic gradient descent (or a variant thereof),\nit is common practice to shuffle the data.  Without getting bogged down in\na detailed discussion, let’s try to get a sense for why this\nshuffling is useful by considering an extreme example.  Suppose you are training\na classifier to tell cats from dogs, and your training set\nis 50,000 cats followed by 50,000 dogs.  If you don’t shuffle, you\nwill get poor training performance.  Strictly speaking the problem\narises from having serial correlation in the noise of your gradients,\ncombined with non-commutativity of parameter updates (if training on\nx and then y were equivalent to training on y and then x, then\nshuffling would have no effect); intuitively, your net will spend\n50,000 examples learning “everything’s a cat” followed by 50,000\nexamples learning “no, everything’s a dog,” and most of the finer\nstructure you might have learned along the way will get drowned out.\n\nIf you only locally shuffle (e.g., maintain a reservoir of 10,000\nexamples that you draw from randomly, which is replenished by\nstreaming through your dataset) then that could be sufficient if serial correlations\nin your data persist for much fewer than 10,000 examples, but it would be insufficient\nin our 50,000 cat–50,000 dog example.\n\nThat’s not to say that shuffling is itself optimal.  E.g., you might get\nbetter training performance by making sure each consecutive pair of training\nexamples has one cat and one dog (though we’ve found there are other problems that\ncrop up with this idea).  Or, there are approaches like\ncurriculum learning (Bengio et al.).\n\nAppendix: Performance comparison\nThe 2-pass shuffle seemed so obviously better than random access into\na file that I hadn’t bothered to measure how much faster it actually\nis.  One approach works, the other doesn’t, what’s there to measure?\nBut the post was met with a lot of skepticism about whether it is\nfaster at all, apparently on the basis that the 2-pass algorithm has\nan extra read/write and SSDs are fast.  So I measured the difference\nand found that, for my data and how it is stored, the 2-pass approach\nis 1000 times as fast as random access (and that’s before\nincorporating further improvements to the 2-pass approach that are\ndone in practice, which are to parallelize the first pass and\nintegrate it with the data preprocessing).  If this sounds too good to\nbe true, bear in mind that this is not a comparison to some\nhighly-regarded practice; it is a comparison to a bad idea, like\nquicksort against bubblesort.\n\nEven with uncompressed data on local SSDs, sequential traversals are\n48 times as fast as random access traversals for my data.\n\nObviously the performance gap will depend on how large your training\nexamples are, your storage setup, what file format you’re using,\nwhether the data is compressed, and so on.  In particular, if\nindividual examples are very large (500kB each?) then random access\ncould be competitive.\n\nThe dataset I tested this on is 220 million examples, 9kB each. It\nwould be 2TB uncompressed.  It is 320GB compressed (4 HDF5 files, 80GB\neach, using HDF5’s internal compression).  If I try to traverse the\ndata by grabbing one random example at a time, it takes 394,000μs\nper example (random access into compressed 80GB files is SLOW). At\nthat rate, it would take 2.75 years to traverse the data once.\n(That’s not doing anything obviously wrong like reopening the file for\neach read—the four files are only opened once.  The only\nobviously wrong thing it’s doing is trying to traverse the data via\nrandom access.)\n\nBy comparison, reading the data sequentially in big blocks, it takes\n120μs/example, and a single traversal of the dataset takes 7.3\nhours.  Taking into account the fact that with the 2-pass algorithm you\nhave to read each data point twice and do an intermediate write, it\ntakes about a day, starting from unshuffled data, to do a random\ntraversal.  This is a 1000x speedup over random access, without\nincorporating anything like parallelizing the first pass, or\npiggybacking the first pass on top of whatever preprocessing you’re\nalready doing.  If I put some effort into optimizing the silly\napproach, I can get the factor to be smaller.  E.g., if I go to the\ntrouble of putting the data on local storage (a RAID array of SSDs in\nthis case), still compressed, and only reading from one file, it’s\n“only” a 460x speedup.  Using uncompressed data (I tested with a\nmemory-mapped .npy file) on locally attached SSD storage yields a\nhefty speedup for both approaches, with random reading taking\n720μs/example and sequential reading taking 15μs/example.\nThis narrows the gap, but not enough to make random access competitive.\n\nSo, the relative speed of sequential access more than compensates for\nthe cost of the first pass (which itself is negligible if you are\ngoing to preprocess the data anyway, as pointed out earlier).\nYou might wonder: even in RAM, sequential access is faster than random\naccess; does this mean that we can make in-memory shuffles faster\nusing an algorithm like this rather than Fisher–Yates \n(where RAM is the new disk, and cache is\nthe new RAM)?  According to the Sanders\npaper mentioned in the\nintroduction, the answer is yes, and he claims a 4x speedup on\ncontemporary hardware.  (Of course, in the context of our problem here,\nwhere the in-memory operations are cheap relative to getting stuff off\nthe disk, that 4x speed up for the in-memory shuffle would make little\ndifference for us.)\n",
        "url"      : "https://blog.janestreet.com/how-to-shuffle-a-big-dataset/",
        "image"    : "https://blog.janestreet.com/how-to-shuffle-a-big-dataset/shuffle_zoom.png",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Deep learning experiments in OCaml",
        "date"     : "September 20, 2018",
        "authorId" : "lmazare",
        "author"   : "Laurent Mazare",
        "tags"     : [],
        "minsToRead" : 12,
        "content"  : "Last year we held a machine learning seminar in our London office,\nwhich was an opportunity to reproduce some classical deep learning\nresults with a nice twist: we used OCaml as a programming language\nrather than Python. This allowed us to train models defined in a\nfunctional way in OCaml on a GPU using TensorFlow.\n\nSpecifically we looked at a computer vision application, Neural Style\nTransfer, and at character-level language modeling.\n\nOCaml Bindings for TensorFlow\n\nTensorFlow is a numerical library using\ndata flow graphs. It was originally developed by Google and was\nopen-sourced at the end of 2015. Using it, one can easily specify a\ncomputation graph and execute it in an optimized way on multiple CPUs\nor GPUs. A typical use case is neural networks, which are easily\nrepresented as computation graphs. TensorFlow can then compute some\nsymbolic gradients for the graphs which makes it easy to minimize some\nloss functions.\n\nSome TensorFlow OCaml\nbindings make it possible\nto use TensorFlow from OCaml programs. These bindings also provide an API that\nmakes it easy to describe complex neural network architecture in a functional\nway.\n\nTo illustrate this here is a simple implementation of\nVGG-19, a classical\nConvolutional Neural Network (CNN) architecture from 2014.\n\nlet vgg19 () =\n  let block iter ~block_idx ~out_channels x =\n    List.init iter ~f:Fn.id\n    |&gt; List.fold ~init:x ~f:(fun acc idx -&gt;\n      Fnn.conv2d () acc\n        ~name:(sprintf \"conv%d_%d\" block_idx (idx+1))\n        ~w_init:(`normal 0.1) ~filter:(3, 3) ~strides:(1, 1) ~padding:`same ~out_channels\n      |&gt; Fnn.relu)\n    |&gt; Fnn.max_pool ~filter:(2, 2) ~strides:(2, 2) ~padding:`same\n  in\n  let input, input_id = Fnn.input ~shape:(D3 (img_size, img_size, 3)) in\n  let model =\n    Fnn.reshape input ~shape:(D3 (img_size, img_size, 3))\n    |&gt; block 2 ~block_idx:1 ~out_channels:64\n    |&gt; block 2 ~block_idx:2 ~out_channels:128\n    |&gt; block 4 ~block_idx:3 ~out_channels:256\n    |&gt; block 4 ~block_idx:4 ~out_channels:512\n    |&gt; block 4 ~block_idx:5 ~out_channels:512\n    |&gt; Fnn.flatten\n    |&gt; Fnn.dense ~name:\"fc6\" ~w_init:(`normal 0.1) 4096\n    |&gt; Fnn.relu\n    |&gt; Fnn.dense ~name:\"fc7\" ~w_init:(`normal 0.1) 4096\n    |&gt; Fnn.relu\n    |&gt; Fnn.dense ~name:\"fc8\" ~w_init:(`normal 0.1) 1000\n    |&gt; Fnn.softmax\n    |&gt; Fnn.Model.create Float\n  in\n  input_id, model\n\n\n\nAs an example of Recurrent Neural Network (RNN) a Long Short-Term\nMemory (LSTM) unit is\nalso pretty easy to define.\n\nlet lstm ~size_c ~size_x =\n  let create_vars () =\n    Var.normal32 [ size_c+size_x; size_c ] ~stddev:0.1,\n    Var.float32 [ size_c ] 0.\n  in\n  let wf, bf = create_vars () in\n  let wi, bi = create_vars () in\n  let wC, bC = create_vars () in\n  let wo, bo = create_vars () in\n  Staged.stage (fun ~h ~x ~c -&gt;\n    let h_and_x = concat one32 [ h; x ] in\n    let c =\n      sigmoid (h_and_x *^ wf + bf) * c\n      + sigmoid (h_and_x *^ wi + bi) * tanh (sigmoid (h_and_x *^ wC + bC))\n    in\n    let h = sigmoid (h_and_x *^ wo + bo) * tanh c in\n    `h h, `c c)\n\n\n\nAdding some Type Safety to TensorFlow\n\nOn the OCaml side a node of the TensorFlow computation graph has a type\n'a Node.t where 'a represents the kind of value that the node contains\nencoded as a polymorphic\nvariant.\nFor example a node could have type [`float] Node.t if the associated value\nis a tensor of single precision floating point values.\n\nThe OCaml code wrapping TensorFlow operations can then specify the kind of\nnodes that are used and returned depending on the operation. For example the\nsquare root function has the following signature:\n\nval sqrt\n  :  ([&lt; `float | `double ] as 't) t\n  -&gt; ([&lt; `float | `double ] as 't) t\n\n\n\nThis signature specifies that sqrt takes as input a single node containing\nsingle or double precision floats. The resulting node contains values of the\nsame kind. Another example could be the greaterEqual comparison operator\nwhich signature is:\n\nval greaterEqual\n  :  ([&lt; `float | `double | `int32 | `int64 ] as 't) t\n  -&gt; ([&lt; `float | `double | `int32 | `int64 ] as 't) t\n  -&gt; [ `bool ] t\n\n\n\nThis specifies that greaterEqual takes as input two nodes. These\nnodes have to contain the same kind of values, and this kind can be single or\ndouble precision floats or 32/64 bits integers. This operation then returns\na tensor of boolean values.\n\nSo thanks to these signatures sqrt (greaterEqual n1 n2)\nwould raise a compile time error as the input of sqrt contains boolean\nvalues.\n\nSpecifying these function signatures manually would be tedious so this\nwrapping OCaml code ends up being generated automatically from the operation\ndescription file provided in the TensorFlow distribution.\n\nThe Node.t type does not encode the dimensions or even the shape of the\nembedded tensor. Including some shape information could probably be useful\nbut would result in more complex function signatures.\n\nUsing a Convolutional Neural Network for Neural Style Transfer\n\nNeural Style Transfer was originally introduced by Gatis, Ecker, and Bethge in\n2015 in A Neural Algorithm of Artistic\nStyle. This paper describes how to combine\na content image and a style image to generate a new image.\n\nThe idea is to use a CNN to extract some features from an image. Informally a\nCNN stacks multiple layers between the input image and their output (which\ncould for example be the class the input image belongs to). The layers that are\nclose to the input image extract low level features, e.g. a vertical line. The\nones that are further away extract higher level features, e.g. fur if it comes\nto recognizing an animal or a tire if it comes to recognizing a car.\n\nAs we want to have the same content as the content image, the final image\nshould have similar high level features at the same positions. For style, the\nfinal image should have the same low level features as the style image, but\nposition is not important here. We just care about having the same intensity on\nthe different low level features.\n\nIn order to experiment with this we used a pre-trained VGG-19 model. The actual\nimplementation and details on how to run it can be found on\nGitHub.\n\nAs an example we took as input a camel picture from\nwikimedia and\nas style image Van Gogh’s famous The Starry\nNight. A generated image\nusing Neural Style Transfer can be seen below.\n\n\n\nGenerating OCaml Code using a Recurrent Neural Network\n\nAndrej Karpathy wrote a very nice blog\npost on how RNNs can\nbe used for character based language modeling. In this setting a RNN is trained\nto predict the next character probability distribution on some very large\ncorpus. Once trained the RNN is given some initial empty memory and is used to\nrandomly generates a character based on the next character probability\ndistribution. The generated character is then given as input to the RNN as well\nas the updated RNN memory to generate the following character and so on.\n\nMore details are available in the original blog post and can be seen in the\nblog post implementation, as a fun\napplication the author used a corpus made of all the works of Shakespeare and\nwas able to generate samples that look very similar to some actual Shakespeare.\n\nOnce again we wrote an OCaml based implementation of the same idea, this uses\ntwo LSTM stacked on each other. The actual implementation can be found on\nGitHub\nas well as instructions on how to run it. As an example we trained a network on\nthe concatenation of the OCaml files from the Base GitHub\nrepo. This resulted in a corpus of 27\nthousand lines. We then used the trained network to generate some random text\nthat looks like OCaml code.\n\n  (** Generic date don't only then of the set operations *)\n\nlet to_int x = Conv.int64_to_int32_exn x.count arg.init ts\n\n  let init =\n    match state with\n    | `Ok x -&gt; x\n    | `Duplicate_key key -&gt;\n      Or_error.error_string \"int64_negative_exn\" l1 l2 l3 ~f:(fun x -&gt;\n      let new_acc, y = f !acc x in\n      acc := new_acc;\n      y)\n  in\n  !acc, result\n;;\n\nlet partition_tf t ~f =\n  let t = create () in\n  enqueue t a;\n  t\n;;\n\n\n\n\nThe resulting code is far from compiling but it gets a bunch of things\nright, like comments being properly opened and closed and containing\ntext that looks like English.\n\nConcluding Remarks\n\nIt was fun to play with OCaml and replicate some well known results in\nthis setting.  However it should be noted that TensorFlow in OCaml has\nsome rough edges.\n\n\n  The machine learning library ecosystem in OCaml is nowhere near as\nwell developed as it is in Python.\n  There are very few resources for learning how to do machine learning\nin OCaml, which is in stark contrast to Python.\n  Shape errors are difficult to debug, since the OCaml bindings don’t\nprovide proper line numbers for where the mismatch is coming from.\nHopefully this could be fixed by switching the bindings to use the\nnew TensorFlow eager mode.\n\n\nThat being said using OCaml has some nice properties too.\n\n\n  Type-safety helps you ensure that your training script is not going\nto fail after a couple hours because of some simple type error.\n  As noted by Christopher Olah in this blog\npost there are some\nnatural typed representations of neural networks which work nicely with\nfunctional programming languages, especially for Recurrent Neural Networks.\n\n\nThe banner at the top of this page has been generated based on this wikimedia\n  image.\n",
        "url"      : "https://blog.janestreet.com/deep-learning-experiments-in-ocaml/",
        "image"    : "https://blog.janestreet.com/deep-learning-experiments-in-ocaml/camel.jpg",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "What the interns have wrought, 2018 edition",
        "date"     : "August 6, 2018",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : ["ocaml","internship","incremental","ui"],
        "minsToRead" : 15,
        "content"  : "Yet again, intern season is coming to a close, and so it’s time to\nlook back at what the interns have achieved in their short time with\nus.  I’m always impressed by what our interns manage to squeeze into\nthe summer, and this year is no different.\n\nThere is, as you can imagine, a lot of ground to cover. With 45\ninterns between our NY, London and Hong Kong offices, there were a lot\nof exciting projects. Rather than trying to do anything even remotely\nexhaustive, I’m just going to summarize a handful of interesting\nprojects, chosen to give a sense of the range of the work interns do.\n\nThe first project is about low-level networking: building the bones of\na user-level TCP/IP stack.  The second is more of a Linux-oriented\nsecurity project: building out support for talking to various kernel\nsubsytems via netlink sockets, to help configuration and management of\nfirewalls. And the last is a project that I mentored, which has to do\nwith fixing some old design mistakes in\nIncr_dom, our framework for\nbuilding efficient JavaScript Web-UIs in OCaml.\n\n(You should remember, every intern actually gets two projects, so this\nrepresents just half of what an intern might do here in a summer.)\n\nReimplementing TCP/IP\n\nTrading demands a lot in performance terms from our networking gear\nand networking code.  Much of this has to do with how quickly\nexchanges generate marketdata.  The US equity markets alone can peak\nat roughly 5 million messages per second, and volumes on the options\nmarkets are even higher.\n\nFor that reason, we end up using some pretty high-performance 10G (and\n25G) network cards.  But fast hardware isn’t enough; it’s hard to get\nreally top-notch networking performance while going through the OS\nkernel.  For that reason, several of these cards have user-space\nnetwork stack implementations to go along with them.\n\nBut these implementations are a mixed bag. They work well, but the\nsubtle variations in behavior between vendors make it hard to build\nportable code.  And the need for these user-space layers to fit to\ntraditional networking APIs means that it’s hard to get the maximum\nperformance that is achievable by the hardware.\n\nFor this reason, we’ve been finding ourselves spending more time\nwriting directly to lower-level, frame-oriented APIs that are exported\nby these cards. That’s relatively straightforward for a stateless\nprotocol like UDP, but TCP is a different beast.\n\nThat’s where intern Sam Kim came in. He spent half the summer reading\nover a copy of TCP/IP Illustrated (volumes\n1\nand\n2!),\nand building up a user-space TCP implementation in pure OCaml. He was\nable to leverage our existing APIs (and, critically, the testing\nframework we had in place for such protocols) to build up a new\nimplementation of the protocol, optimized for our environment of fast\nlocal LANs.  And he wrote a lot of tests, helping exercise many\ndifferent aspects of the code.\n\nThis is not a small amount of work. TCP is a complex protocol, and\nthere’s a lot of details to learn, including connection setup,\nretransmission, and congestion control.\n\nOne of the more exciting moments of this project was at the end, when,\nafter doing all the testing, we connected Sam’s implementation to a\nreal network card and ran it. After some small mistakes in wiring it\nup (not Sam’s mistakes, I should mention!) it worked without a hitch,\nand kept on working after he added a bunch of induced packet drops.\nSurely there’s more work to do on the implementation, but it’s an\nauspicious start.\n\nTalking to the Kernel via Netlink\n\nWe have an in-house, Linux-based firewall solution called\nnap-enforcer, which relies on the built-in stateful firewall\nfunctionality in Linux’s netfilter subsystem.  Part of this stateful\nfirewall support is the ability to keep track of the protocol state of\nconnections going through the firewall, i.e., connection tracking, or\nconntrack for short. Conntrack is necessary for the correct handling of\nstateful protocols, like FTP.\n\nWhen troubleshooting firewall issues, it’s helpful to be able to\ninspect and modify the tables that carry this state. We also want to\nbe able to subscribe to events from conntrack and generate log\nmessages for interesting changes, like a connection being open or\nclosed.\n\nThis functionality can be controlled via a netlink socket, which is\na special kind of socket that enables message-oriented communication\nbetween userspace processes and the kernel.\n\nInitially, we built nap-enforcer on top of the command-line\nconntrack utility. This worked well enough at first, but it doesn’t\nwork well for subscribing to streams of events, and conntrack itself\nhas some issues: it’s easy to crash it, and it’s inconsistent in its\nbehavior, which just makes it hard to use.\n\nCristian Banu’s project was to fix this by writing an OCaml library\nthat lets us talk directly to various kernel subsystems (primarily\nconntrack) over netlink sockets.\n\nThis is trickier than it might seem. Some of these interfaces are\nrather poorly documented, and existing C libraries don’t always offer\nvery convenient APIs, so a large part of the job was reading the Linux\nkernel code to understand what really is happening and then figuring\nout a convenient and type-safe way to make this functionality\navailable to OCaml. The resulting library offers a generic and safe\nhigh-level interface to netlink sockets, plus some abstractions built\non top for specific netlink-based protocols.\n\nOne tricky corner of a high-level netlink API is providing a safe\ninterface for constructing valid Netlink messages without making\nassumptions about the higher-level protocol. Cristian’s library wraps\nthose computations in an Atkey-style indexed\nmonad\nwhich guarantees that the underlying C library (libmnl) is used in a\nsafe way and that the resulting message is valid at the generic\nnetlink level.\n\nCristian also worked out a way to have repeatable automated tests for\nthe netlink library under our build system,\njenga. This is a non-trivial\nproblem because most of these kernel APIs require root access and\nkernel modules that aren’t loaded by default. His solution involves\nrunning tests in a network namespace with an owning user namespace\nthat maps the unprivileged user running the test suite to the root\nuser. This allows the test cases to use otherwise privileged\nnetwork-related system calls, but only on the subset of network\nresources governed by the testing namespace.\n\nThe project is not yet finished, but the results are very promising,\nand we hope to move this to production over the next few months.\n\nStreamlining Incr_dom\n\nFor a while now, we’ve been using a library we developed internally,\ncalled Incr_dom, for\nbuilding web front-ends in OCaml.\n\nYou can think of Incr_dom as a variation on\nReact or the Elm\nArchitecture, except with a\ndifferent approach to performance.  A key feature of React and Elm is\nthat they let you express your UI via simple data-oriented models plus\nsimple functions that do things like compute the view you want to\npresent, typically in the form of a so-called virtual\nDOM.\n\nWhat Incr_dom adds to the mix is a lot of power to optimize the\ncomputations that need to be done when doing things like computing the\nview given the current value of the model. (Elm and React both have\nnice approaches to this as well, though they err on the side of having\nan easier to use optimization framework that isn’t as powerful.)  This\nis important to us because of the nature of our business: trading\napplications often have complex, fast-changing models, and being able\nto render those efficiently is of central importance.\n\nThat’s why Incr_dom is built on Incremental, a library whose entire\npurpose is optimization.  Incremental is good at constructing, well,\nincremental\ncomputations,\ni.e., computations that only need to do a small amount of work when\nthe input changes in small ways. The key is that Incremental lets you\nwrite your code so that it reads like a simple all-at-once\ncomputation, but executes like a hand-tuned, incremental one.\nIncremental computations are very useful when constructing UIs in this\nstyle, since your data model doesn’t typically change all at once.\n\nI’ve written\nmore than a\nfew blog\nposts\nabout the basic ideas, and since then, we actually had some interns do\nmuch of the\nwork\nof getting it up and running.  But that initial design had some sharp\nedges that we didn’t know how to fix. And that’s where Jeanne Luning\nPrak’s project this year came in.\n\nThe key problem with the original design was something called the\n“derived model”. To understand where the derived model comes into\nplay, you need to know a bit more about Incr_dom.  An Incr_dom app\nneeds to know how to do more than how to render its model. Here’s a\nsimplified version of the interface that a simple Incr_dom app needs\nto satisfy which shows a bit more of the necessary structure.\n\nmodule type App = sig\n  type model\n  type action\n  \n  val view : model Incr.t -&gt; schedule:(action -&gt; unit) -&gt; Vdom.t Incr.t\n\n  val apply_action : model -&gt; action -&gt; model\nend\n\n\n\nThe view function is what we described above. It takes as its input\nan incremental model, and returns an incremental virtual-dom\ntree. Note that it also takes a function argument, called schedule,\nwhose purpose is to allow the virtual-dom to have embedded callbacks\nthat can in turn trigger actions that update the model.  This is\nessentially how you wire up a particular behavior to, say, a button\nclick.\n\nThose actions are then applied to the model using the provided\napply_action function. This all works well enough for cases where\nthe required optimization is fairly simple. But it has real\nlimitations, because the apply_action function, unlike the view\nfunction, isn’t incremental.\n\nTo see why this is important, imagine you have a web app that’s\nrendering a bunch of data in a table, where that table is filtered and\nsorted inside of the browser. The filtering and sorting can be done\nincrementally in the view function, so that changing data can be\nhandled gracefully. But ideally, you’d like for the apply_action\nfunction to have access to some of the same data computed by view.\nIn particular, if you define an action that moves you to the next row,\nthe identity of that row depends on how the data has been sorted and\nfiltered.  And you don’t want to recompute this data every time\nsomeone wants to move from one row to the next.\n\nIn the initial design, we came up with a somewhat inelegant solution,\nwhich was to add a new type, the derived model, which is computed\nincrementally, and then shared between the view and apply_action\nfunctions.  The resulting interface looks something like this:\n\nmodule type App = sig\n  type model\n  type derived_model\n  type action\n  \n  val derive : model Incr.t -&gt; derived_model Incr.t\n\n  val view\n    :  model Incr.t\n    -&gt; derived_model Incr.t\n    -&gt; schedule:(action -&gt; unit)\n    -&gt; Vdom.t Incr.t\n\n  val apply_action\n    :  model\n    -&gt; derived_model\n    -&gt; action\n    -&gt; model\nend\n\n\n\nAnd this works. You can now structure your application so that the\ninformation that both the view and the action-application function\nneed to know can be shared in this derived model.\n\nBut while it works, it’s awkward. Most applications don’t need a\nderived model, but once any component needs to use it, every\nintermediate part of your application now has to think about and\nhandle the derived model as well.\n\nI came into the summer with a plan for how to resolve this issue. On\nsome level, what we really want is a compiler optimization.  Ideally,\nboth view and apply_action would be incremental functions, say,\nwith this signature:\n\nmodule type App = sig\n  type model\n  type action\n  \n  val view : model Incr.t -&gt; schedule:(action -&gt; unit) -&gt; Vdom.t Incr.t\n\n  val apply_action : model Incr.t -&gt; action Incr.t -&gt; model Incr.t\nend\n\n\n\nThen, both apply_action and view can independently compute what\nthey need to know about the row structure, and do so incrementally. At\nthat point there’s only one problem left: these computations are\nincremental, but they’re still being duplicated.\n\nBut that’s easy enough to fix, I thought: we can do some form of\nclever common-subexpression elimination. The basic idea was to cache\nsome computations in a way that when view and apply_action tried\nto compute the very same thing, they would end up with a single copy\nof the necessary computation graph, rather than two.\n\nThis turned out to be complicated for a few reasons, one of them being\nthe rather limited nature of JavaScript’s support for weak references,\nwhich are needed to avoid memory leaks.\n\nLuckily, Jeanne had a better idea. Rather than some excessively clever\ncomputation-sharing, we could just change the shape of the\nAPI. Instead of having separate functions for view and\napply_action, we would have one function that computed both. To that\nend, she created a new type, a Component.t, which had both the view\nand the apply_action logic.  The type is roughly this:\n\nmodule Component : sig\n   type ('model,'action) t =\n      { view : Vdom.t\n      ; apply_action : 'action -&gt; 'model }\nend\n\n\n\nAnd now, the app interface looks like this:\n\nmodule type App = sig\n  type model\n  type action\n  \n  val create\n    :  model Incr.t\n    -&gt; schedule:(action -&gt; unit)\n    -&gt; (action,model) Component.t Incr.t\nend\n\n\n\nBecause create is a single function, it can behind the scenes\nstructure the computation any way it wants, and so can share work\nbetween the computation of the view and the computation of the\naction-application function.\n\nThis turned out to be a really nice design win, totally eliminating\nthe concept of the derived model and making the API a lot simpler to\nuse.  And she’s gotten to see the full lifecycle of the project:\nfiguring out how to best fix the API, implementing the change, testing\nit, documenting it, and figuring out how to smash the tree to upgrade\neveryone to the new world.\n\nAnd actually, this is only about half of what Jeanne did in this half\nof the summer. Her other project was to write a syntax extension to\ncreate a special kind of incremental pattern-match, which has\napplications for any use of Incremental, not just for UIs. That should\nmaybe be the subject of another blog post.\n\nApply to be an intern!\n\nI hope this gives you a sense of the nature and variety of the work\nthat interns get to do, as well as a sense of the scope and\nindependence that they get in choosing how to tackle these problems.\n\nIf this sounds like a fun way to spend the summer, you should\napply! And in\ncase you’re wondering: no, you don’t need to be a functional\nprogramming wizard, or have ever programmed in OCaml, or know anything\nabout finance or trading, to be an intern.  Most of our interns come\nin with none of that, and they still do great things!\n",
        "url"      : "https://blog.janestreet.com/what-the-interns-have-wrought-2018/",
        "image"    : "https://blog.janestreet.com/what-the-interns-have-wrought-2018/smelting.jpg",
        "topic"    :  ["technology","ocaml","internship","incremental","ui"] ,
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Plans for OCaml 4.08",
        "date"     : "June 29, 2018",
        "authorId" : "lwhite",
        "author"   : "Leo White",
        "tags"     : [],
        "minsToRead" : 17,
        "content"  : "With the external release of OCaml 4.07.0 imminent, we in Jane Street’s\nTools & Compilers group have been planning what we want to work on for\ninclusion in OCaml 4.08. These days OCaml uses (or at least attempts) a\ntime-based release process with releases scheduled every 6 months. We’re\ntrying to avoid rushing in changes at the last minute – as we’ve been\nprone to do in the past – so this list is restricted to things we could\nconceivably finish in the next 4-5 months.\n\nThis blog post is part of a drive towards making OCaml compiler\ndevelopment – both inside and outside of Jane Street – more\ntransparent to the wider OCaml community. By its nature this post is\ntechnical and very OCaml-specific. It has no narrative or moral, no\nbeginning, middle and end, it is just a list of language\nfeatures. Hopefully however, for those of you interested in OCaml’s\ndevelopment, it is at least an interesting list.\n\nSupport for GADTs in or-patterns\n\nCurrently, you cannot use GADT constructors with or-patterns. For\nexample:\n\n# type 'a ty = Int : int ty | Bool : bool ty | String : string ty;;\n\ntype 'a ty = Int : int ty | Bool : bool ty | String : string ty\n\n# let is_string : type a. a ty -&gt; bool = function\n    | Int | Bool -&gt; false\n    | String -&gt; true;;\n\n    Characters 54-57:\n      | Int | Bool -&gt; false\n        ^^^\nError: This pattern matches values of type int ty\n       but a pattern was expected which matches values of type a ty\n       Type int is not compatible with type a\n\n\n\nwhich forces you to duplicate the match case for the different\nconstructors:\n\n# let is_string : type a. a ty -&gt; bool = function\n    | Int -&gt; false\n    | Bool -&gt; false\n    | String -&gt; true;;\n\nval is_string : 'a ty -&gt; bool = &lt;fun&gt;\n\n\n\nThomas Refis has a patch to allow GADTs to be matched inside of\nor-patterns, allowing the above code to type-check, and we would like to\nget that merged for 4.08.\n\nNote that this patch does not allow type equations introduced by the\nconstructors to be used in the match case. For example, one could\nimagine allowing the following code:\n\ntype ('a, 'b) ty_pair =\n  | Int_int : (int, int) ty_pair\n  | Int_bool : (int, bool) ty_pair\n  | Bool_int : (bool, int) ty_pair\n  | Bool_bool : (bool, bool) ty_pair\n\nlet fst_default : type a b. (a, b) ty_pair -&gt; a = function\n  | Int_int | Int_bool -&gt; 0\n  | Bool_int | Bool_bool -&gt; false\n\n\n\nsince both Int_int and Int_bool add the same equation on a (a =\nint) which is sufficient to type-check the match case. However\nsupporting this in the general case is very tricky. We have some ideas\nabout how to do it but they won’t be ready for 4.08.\n\nImprove type propagation in lets\n\nCurrently code like this fails to compile:\n\ntype t1 = T\ntype t2 = T\nlet foo () = (T : t1)\nlet T = foo ()\n\n\n\nbecause the pattern T is type-checked before the expression foo ().\n\nI think this was originally done this way so that code like:\n\ntype t1 = T\ntype t2 = T\nlet x : t1 = T\n\n\n\nwould work because the : t1 annotation was considered part of the\npattern.\n\nRecently the handling of : t1 changed so that it was copied onto both\nthe pattern and the expression during. This means we don’t need to\ntype-check the pattern first anymore, but also creates its own\ndifficulties, especially for ppx writers. There are also some odd corner\ncases because polymorphic annotations on lets are treated differently.\n\nWe plan to change the AST to have an explicit representation for:\n\nlet pat : typ = exp\n\n\n\nand change the order of type checking so that it goes: typ =&gt; exp =&gt;\npat, which seems to be the most natural order.\n\nShadowing of items from “include”\n\nCurrently, to extend a module you must do something like:\n\ninclude (Foo : module type of struct include Foo end with module Bar := Foo.Bar)\nmodule Bar = struct\n  include Foo.Bar\n  let baz = 42\nend\n\n\n\nwhich is a bit of a mouthful. This is because you cannot define two\nmodules with the same name within a single module. Whilst there are\nvarious reasons not to allow module shadowing, most of them don’t apply\nto modules that come from an include statement. So we would like to\nadd the ability to shadow modules, types, etc. that have come from an\ninclude statement. This will allow the above to be written as:\n\ninclude Foo\nmodule Bar = struct\n  include Foo.Bar\n  let baz = 42\nend\n\n\n\nPrivate structure items\n\nIt would be convenient to be able to define type aliases or modules for use within a\nmodule, without adding them to the module itself (and thus requiring a signature to\nremove them). We would like to add support for declarations such as:\n\nprivate type tmp = t list\nprivate module M = F(X)\n\n\n\nthat will not appear in the surrounding module.\n\nNote that this isn’t intended as any kind of replacement for mli files –\nmli files should always be used. It is really intended for use in\nsub-modules where the benefit of a full signature might be minimal.\n\nIt will also be supported in signatures, allowing things like:\n\nmodule type S = sig\n  type t\n  module Sub : sig\n    private type outer = t\n    type t\n    val to_outer : t -&gt; outer\n  end\nend\n\n\n\nwhich avoids forcing modules of type S to define an outer type.\n\nOne could also imagine using this feature to fix “ocamlc -i” which, currently,\ncan sometimes print incorrect interfaces.\nFor example, running “ocamlc -i” on the following module:\ntype t = T\nmodule Sub = struct\n  type t = S\n  let to_outer S = T\nend\n\n\nproduces:\ntype t = T\nmodule Sub : sig\n  type t = S\n  val to_outer : t -&gt; t\nend\n\n\nwhich would be incorrect if put in an .mli.\n\nStrengthening the module system\n\nThe most subtle parts of OCaml’s module system all revolve around the\nequalities between different items in modules. For example, a common\nbeginner mistake is to write something like:\n\nmodule Foo : S = struct\n  type t = foo\n  ...\nend\n\n\n\nwhen they really wanted:\n\nmodule Foo : S with type t = foo = struct\n  type t = foo\n  ...\nend\n\n\n\nThe most confusing type errors from the module system come from when a\nmodule has unexpectedly lost an equality. In the example above it is\nprobably the programmer’s fault, but in many corner cases OCaml will\nunhelpfully remove an equality when it really shouldn’t.\n\nWe’re planning to include a number of features that involve OCaml\nkeeping more equalities on module types. A module type with more\nequalities is sometimes called a “stronger” module types – hence the\nabove title.\n\nUnfortunately, strengthening module types is not a backwards compatible\nchange. For example, changing the type of a functor from:\n\nmodule F (X : sig type t end) : ...\n\n\n\nto\n\nmodule F (X : sig type t = int end) : ...\n\n\n\nwould clearly break some uses of F. We feel that the short-term pain\nof any breakages caused by these changes is worth it for the long-term\ngain of removing a number of confusing corner cases from the language.\n\nTransparent ascription\n\nLast year, as part of his internship at Jane Street, Maciej Debski\nimplemented transparent ascription. Transparent ascription is an\noperation:\n\nmodule M = (N &lt;: S)\n\n\n\nthat restricts M to the elements of the module type S, but it is\nstill known to be equal to N. For example, M.t would be known to be\nequal to N.t and Map.Make(M).t would be known to be equal to\nMap.Make(N).t. This is in contrast to ordinary ascription:\n\nmodule M = (N : S)\n\n\n\nwhere M.t is not equal to N.t.\n\nThis feature is pretty useful on its own, but its mainly needed as a\nprerequisite for the other features in this list.\n\nAliases to functor parameters\n\nCurrently you cannot create a module alias for a functor parameter. For example:\n\nmodule F (X : S) = struct module M = X end\n\n\n\ncurrently has a type like:\n\nmodule F (X : S) : sig module M : sig type t = X.t ... end end\n\n\n\nThe addition of transparent module ascription will make it possible to\nhave module aliases for functor parameters. So the above example would\nhave type:\n\nmodule F (X : S) : sig module M = (X &lt;: S) end\n\n\n\nFix “with module”\n\nCurrently, the semantics of with module N = M are not to add an alias\nN = M, but to give N the strengthened module type of M. For\nexample:\n\nmodule M = struct\n  type t\n  module Inner : sig\n    type t\n  end\nend\n\nmodule type S = sig\n  module N : sig\n    module Inner : sig\n      type t\n    end\n  end\nend\n\nmodule type T = S with module N = M\n\n\n\nwill give\n\nmodule type T = sig\n  module N : sig\n    type t = M.t\n    module Inner : sig\n      type t = M.Inner.t\n    end\n  end\nend\n\n\n\nWe would like to strengthen this behaviour to give the alias instead:\n\nmodule type T = sig module N = M end\n\n\n\nIn addition we would support the syntax\n\nmodule type T = S with module N : R\n\n\n\nto extend the signature of sub-module N, which will allow the old\nbehavior of with module to be obtained using:\n\nmodule type T = S with module N : module type of M\n\n\n\nWe would also allow the syntax:\n\nmodule type T = S with module N = (M &lt;: R)\n\n\n\nto extend the signature with a transparent ascription. Note that this\nlast syntax, in combination with the support for aliasing functor\nparameters should finally allow with module to correctly specify the\nresult of a functor application.\n\nFix “module type of”\n\nCurrently module type of gives the unstrengthened module type. For example,\n\nmodule M = struct type t end\n\nmodule type S = module type of M\n\n\n\nproduces\n\nmodule type S = sig type t end\n\n\n\nThis causes all kinds of problems. We will change it to give the\nstrengthened module type instead, and support a [@weak] attribute to go\nback to the existing semantics. So the above example would give:\n\nmodule type S = sig type t = M.t end\n\n\n\nThis change somewhat depends on the other strengthening changes to be\ntractable because it makes the requirements for a signature like:\n\nmodule N : module type of M\n\n\n\nstricter, and without the other changes many more [@weak] attributes are\nneeded to get things compiling again.\n\nFlambda\n\nFlambda is the name of a series of optimisation passes provided by the\nnative code compilers, which are not enabled by default.\n\nMuch of our recent efforts around flambda have been focused around a\nmajor rewrite of some of its core components, imaginatively dubbed\nflambda 2.0. However, we do have some changes planned for the existing\nversion – mostly because they’re orthogonal to the flambda 2.0 changes\nand should be easy to port between the two versions.\n\nWork towards making flambda classic mode the default compilation mode\n\nThere were a number of improvements made to the compile-time performance\nof flambda’s -Oclassic mode in OCaml 4.07.0. We’re going to be\nbenchmarking the performance, both run-time and compile-time, of classic\nmode over the next few months. If the comparison with the current\ndefault (non-flambda) compilation mode is good then we would like to\nmake classic mode the default upstream. If it still needs more work then\nwe’ll be trying to get that done for 4.08 so that the default might be\nchanged in 4.09.\n\nImproved inlining heuristics for recursion\n\nAs part of his internship at Jane Street, Luke Maurer did some work to\ngive a proper semantics to how flambda handles recursion. In particular,\nit will give command-line arguments like -inline-max-depth, and\nattributes like [@unroll 7] a well defined meaning.\n\nImproved DWARF and GDB support\n\nMark Shinwell’s patches to dramatically improve DWARF output and GDB\nsupport have been waiting on review for a long time. We aim to get this\nreview done and the patches merged upstream for this release.\n\nMove the parser to Menhir\n\nMenhir is a parser generator for OCaml that is vastly superior to\nocamlyacc. In particular, it has much better support for producing\nuseful error messages.\n\nA pull-request (#292) was created\nyears ago by Gabriel Scherer and Frédéric Bour, to switch OCaml’s parser to\nMenhir. This work got resurrected recently and we are trying to lend a hand\nwith finishing and testing it.\n\nThe days of Error: Syntax error. may be numbered.\n\nAdd unsigned integer operations\n\nThere are open pull requests for both adding unsigned integer types and\nadding unsigned operations to the existing integer types. There is no\nconsensus at this point around adding actual unsigned integer types, but\nthere seems to be general agreement for at least providing the unsigned\noperations. We’ve had some demand for this internally, so we intend to\nhelp push this work along so it can be merged for 4.08.\n",
        "url"      : "https://blog.janestreet.com/plans-for-ocaml-408/",
        "image"    : "https://blog.janestreet.com/plans-for-ocaml-408/ocaml_release.jpg",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Repeatable exploratory programming",
        "date"     : "April 22, 2018",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 11,
        "content"  : "Expect tests are a technique I’ve written about\nbefore, but until recently, it’s been a\nlittle on the theoretical side. That’s because it’s been hard to take\nthese ideas out for a spin due to lack of tooling outside of Jane\nStreet’s walls.\n\nThat’s changed now, since Dune has\ngotten good support for using expect tests. Given that, I thought this\nwould be a nice time to demonstrate how expect-tests can be useful in\nsome ways you might not expect; in particular, as a way of doing\nexploratory programming.\n\nPreliminaries\n\nThe basic idea of an expect test is simple: expect tests let you\ngenerate output that is then captured and included in the source\nfile. To try this out, let’s first create a jbuild file for our\nlittle experiment.\n\n(jbuild_version 1)\n\n(library\n ((name foo)\n  (libraries (base stdio))\n  (inline_tests)\n  (preprocess (pps (ppx_jane)))\n))\n\n\n\nNote that you’ll have to opam install base, stdio and ppx_jane\nfor any of this to work. The inclusion of the (inline_tests)\ndeclaration is important here, as is the preprocessor line.\n\nNow, we can write a simple .ml file that uses the expect test\nframework to generate some output.\n\nopen! Base\nopen! Stdio\n\nlet%expect_test \"simple\" =\n  print_endline \"Hello Expect World!\"\n\n\n\nWe can then run this test and automatically capture the results by\nrunning dune (which is still, confusingly, called jbuilder at the\ncommand line.)\n\njbuilder runtest --auto-promote\n\n\n\nYou’ll now see the file change to the following as soon as the build\nis complete. (This is all more fun if your editor is set to\nauto-refresh.)\n\nopen! Base\nopen! Stdio\n\nlet%expect_test \"simple\" =\n  print_endline \"Hello Expect World!\";\n  [%expect {| Hello Expect World! |}]\n\n\n\nSmashing some HTML\n\nNow let’s get to the exploratory programming part.\n\nWe’ll demonstrate a classic exploratory programming task: munging an\nHTML file to get some useful data. In particular, let’s say we want to\nfind internal links on the\nopensource.janestreet.com site.\nWe’re going to use\nlambdasoup, which is a great\nlibrary for transforming HTML files.\n\nAfter installing lambdasoup via opam, we need to update our jbuild\nfile accordingly. We should also install and include support for a\nlibrary called expect_test_helpers_kernel, which provides some\nuseful tools for building expect tests.\n\n(jbuild_version 1)\n\n(library\n ((name foo)\n  (libraries (base stdio expect_test_helpers_kernel lambdasoup))\n  (inline_tests)\n  (preprocess (pps (ppx_jane)))\n))\n\n\n\nNow, we can write a little function for extracting links from an HTML\nfile, using lambdasoup.\n\nopen! Base\nopen! Stdio\nopen! Expect_test_helpers_kernel\n\nlet get_hrefs soup =\n  Soup.select \"a\" soup\n  |&gt; Soup.to_list\n  |&gt; List.map ~f:(Soup.R.attribute \"href\")\n\n\n\nWe can test this out by writing an expect test against a little\nexample.\n\nlet%expect_test \"soup\" =\n  let example = {|\n     &lt;html&gt;&lt;body&gt;\n       &lt;a href=\"http://janestreet.com\"&gt; A link! &lt;/a&gt;\n     &lt;/body&gt;&lt;/html&gt; |}\n  in\n  let hrefs = get_hrefs (Soup.parse example) in\n  print_s [%sexp (hrefs : string list)]\n\n\n\nNote that we use print_s from expect_test_helpers_kernel to format\nthe s-expression, and the %sexp syntax extension to generate the\ns-expression to print.  Again, if we run jbuilder again, the output\nwill be inserted into the file for us.\n\nlet%expect_test \"soup\" =\n  let example = {|\n     &lt;html&gt;&lt;body&gt;\n       &lt;a href=\"http://janestreet.com\"&gt; A link! &lt;/a&gt;\n     &lt;/body&gt;&lt;/html&gt; |}\n  in\n  let hrefs = get_hrefs (Soup.parse example) in\n  print_s [%sexp (hrefs : string list)];\n  [%expect {| (http://janestreet.com) |}]\n\n\n\nAt this point, it might occur to us to wonder what would happen if we\nhad an &lt;a&gt; element with no href. Well, we can just try that out.\n\nlet%expect_test \"soup\" =\n  let example = {|\n     &lt;html&gt;&lt;body&gt;\n       &lt;a href=\"http://janestreet.com\"&gt; A link! &lt;/a&gt;\n       &lt;a&gt; A broken link! &lt;/a&gt;\n     &lt;/body&gt;&lt;/html&gt; |}\n  in\n  let hrefs = get_hrefs (Soup.parse example) in\n  print_s [%sexp (hrefs : string list)];\n  [%expect {| (http://janestreet.com) |}]\n\n\n\nRerunning the test demonstrates that our code throws an exception in\nthis case.\n\nlet%expect_test \"soup\" =\n  let example = {|\n     &lt;html&gt;&lt;body&gt;\n       &lt;a href=\"http://janestreet.com\"&gt; A link! &lt;/a&gt;\n       &lt;a&gt; A broken link! &lt;/a&gt;\n     &lt;/body&gt;&lt;/html&gt; |}\n  in\n  let hrefs = get_hrefs (Soup.parse example) in\n  print_s [%sexp (hrefs : string list)];\n  [%expect {| DID NOT REACH THIS PROGRAM POINT |}];\n  [%expect {|\n    (* expect_test_collector: This test expectation appears to contain a backtrace.\n       This is strongly discouraged as backtraces are fragile.\n       Please change this test to not include a backtrace. *)\n\n    (\"A top-level expression in [let%expect] raised -- consider using [show_raise]\"\n     (Failure \"Soup.R.attribute: None\")\n     (backtrace (\n       \"Raised at file \\\"pervasives.ml\\\", line 32, characters 17-33\"\n       \"Called from file \\\"src/list.ml\\\", line 326, characters 13-17\"\n       \"Called from file \\\"test.ml\\\", line 17, characters 14-44\"\n       \"Called from file \\\"src/expect_test_helpers_kernel.ml\\\", line 475, characters 6-11\"))) |}]\n\n\n\nWe can fix this easily enough by changing the selector we use to only\nlook for &lt;a&gt; nodes with an href, as follows.\n\nlet get_hrefs soup =\n  Soup.select \"a[href]\" soup\n  |&gt; Soup.to_list\n  |&gt; List.map ~f:(Soup.R.attribute \"href\")\n\n\n\nAnd now, rerunning jbuilder will show that we get reasonable output\nonce again.\n\nlet%expect_test \"soup\" =\n  let example = {|\n     &lt;html&gt;&lt;body&gt;\n       &lt;a href=\"http://janestreet.com\"&gt; A link! &lt;/a&gt;\n       &lt;a&gt; A broken link! &lt;/a&gt;\n     &lt;/body&gt;&lt;/html&gt; |}\n  in\n  let hrefs = get_hrefs (Soup.parse example) in\n  print_s [%sexp (hrefs : string list)];\n  [%expect {| (http://janestreet.com) |}]\n\n\n\nAdding some real data\n\nWhat if we want to apply this to some real data? Let’s grab the\ncurrent contents of opensource.janestreet.com from the web and save\nit to a file called opensource.html. If we want our test to be able\nto read from this file, we need to add it as an explicit dependency,\nso we’ll adjust the jbuild file accordingly.\n\n(jbuild_version 1)\n\n(library\n ((name foo)\n  (libraries (base stdio expect_test_helpers_kernel lambdasoup))\n  (inline_tests ((deps (opensource.html))))\n  (preprocess (pps (ppx_jane)))\n))\n\n\n\nNow, we can add a new test, to see what our function does on\nopensource.html.\n\nlet%expect_test \"opensource\" =\n  let soup = In_channel.read_all \"opensource.html\" |&gt; Soup.parse in\n  let hrefs = get_hrefs soup in\n  print_s [%sexp (hrefs : string list)]\n\n\n\nAgain, if we run the test, the file will be updated to include the\noutput.\n\nlet%expect_test \"opensource\" =\n  let soup = In_channel.read_all \"opensource.html\" |&gt; Soup.parse in\n  let hrefs = get_hrefs soup in\n  print_s [%sexp (hrefs : string list)];\n  [%expect {|\n    (https://www.janestreet.com/ad-cookie-policy\n     https://opensource.janestreet.com/\n     https://github.com/janestreet\n     https://ocaml.janestreet.com/ocaml-core/latest/doc/index.html\n     https://github.com/janestreet\n     https://github.com/ocaml/dune\n     https://opensource.janestreet.com/base\n     https://opensource.janestreet.com/core\n     https://opensource.janestreet.com/async\n     https://opensource.janestreet.com/incremental\n     https://www.janestreet.com/technology/\n     https://blog.janestreet.com/\n     https://opensource.janestreet.com/contribute\n     https://janestreet.com/) |}]\n\n\n\nNow, we only wanted to extract the links that were actually on\nopensource.janestreet.com, and we got a bunch of other irrelevant\nlinks.  To fix this, we need to analyze the URIs, so we’ll install the\nuri package from opam and add it to our jbuild, at which point we can\nchange the code as follows.\n\nlet%expect_test \"opensource\" =\n  let soup = In_channel.read_all \"opensource.html\" |&gt; Soup.parse in\n  let internal_links =\n    get_hrefs soup\n    |&gt; List.filter ~f:(fun uri -&gt;\n        let uri = Uri.of_string uri in\n        match Uri.host uri with\n        | None -&gt; false\n        | Some host -&gt; String.(=) host \"opensource.janestreet.com\")\n  in\n  print_s [%sexp (internal_links : string list)];\n  [%expect {|\n    (https://opensource.janestreet.com/\n     https://opensource.janestreet.com/base\n     https://opensource.janestreet.com/core\n     https://opensource.janestreet.com/async\n     https://opensource.janestreet.com/incremental\n     https://opensource.janestreet.com/contribute) |}]\n\n\n\nWhich gives us what we were looking for.\n\nWhat’s nice about this approach is that we’ve been able to do this all\nin a way that’s both lightweight and repeatable. We can take the code\nwe’ve written, commit it to the repo we’re working on, and anyone else\ncan try to extend our examples. What’s more, once the logic we want is\nfinished, it might make sense to leave in these little experiments as\nregression tests, which will help make sure that we don’t break things\nas we start refactoring and reorganizing the code later.\n",
        "url"      : "https://blog.janestreet.com/repeatable-exploratory-programming/",
        "image"    : "https://blog.janestreet.com/repeatable-exploratory-programming/lambdasoup.jpg",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "OCaml all the way down",
        "date"     : "April 4, 2018",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 1,
        "content"  : "One of the joys of working at Jane Street for the last 15 or so years\nhas been seeing how our software stack has grown in scope. When I\nstarted, I was building pretty narrowly focused systems for doing\nstatistical research on trading strategies, and then building systems\nfor executing those same strategies.\n\nIn the decade or so since then, the software we’ve built has grown in\nall sorts of unexpected directions, reflecting the diverse needs of\nour business, which is way richer and more complex than I would have\nimagined at first.\n\nWhatever the purpose of the systems we’ve built, be it mail-handling,\nmarketdata or machine learning, we’ve built nearly all of them in\nOCaml.  That has its ups and downs of course; no language is perfect\nfor everything. But we’ve gotten a lot of benefit from having a\nunified development environment, with shared tools, libraries and\nidioms that we’ve been able to hone to the benefit of the whole\norganization.\n\nThe latest extension of this has been into the space of FPGA\napplications. FPGAs are an increasingly attractive form of\nreconfigurable hardware, and we’ve been excited to have Andy Ray join\nus and help us figure out how to build them effectively, while\nleveraging the value we get from our existing OCaml codebase.\n\nIf you want to learn more about it, Andy’s giving a talk in our New\nYork office on April 18th, and you can find the details of how to sign\nup\nhere.\n",
        "url"      : "https://blog.janestreet.com/ocaml-all-the-way-down/",
        "image"    : "https://blog.janestreet.com/ocaml-all-the-way-down/fpga.jpg",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Putting the I back in IDE: Towards a Github Explorer",
        "date"     : "March 27, 2018",
        "authorId" : "jsomers",
        "author"   : "James Somers",
        "tags"     : [],
        "minsToRead" : 13,
        "content"  : "Imagine a system for editing and reviewing code where:\n\n\n  \n    Every branch of every repo gets its own sandboxed directory. Your\nrevision history in each branch, including uncommitted stuff, is\npersisted, as are build artifacts. When you switch contexts, each\nproject is just as you left it.\n  \n  \n    Within your editor, you can pull up a global view of all your\nbranches, your outstanding pull requests, and the pull requests\nyou’re assigned to review. It’s one keystroke to view the summary\nfor a pull, and one more to start editing any of its files right\nthere under your cursor.\n  \n  \n    Code review happens entirely within the editor. You’re fed a series\nof diffs: one keystroke to approve, one keystroke to start\nediting. Dive in, make your changes, leave comments for the author,\npush, and move on.\n  \n\n\nWe’ve actually developed this workflow at Jane Street, and it’s been\nused daily by hundreds of engineers for about two years now. It feels\nlike what an integrated development environment is supposed to feel\nlike. Making software today is as much about collaboration as it is\nabout writing your own code. But most developers are forced to switch\nfrom their editor to a web browser to read someone else’s code, and\nswitch back if they want to play with it themselves.\n\nCode review that takes place in a browser isn’t just more\ninconvenient, it’s often shallower, too: to really understand a piece\nof code you have to build it, run it, and explore it with your own\nhands, in your own editor. The system we’ve built at Jane Street is\ndesigned to make that level of engagement seamless.\n\nThe trouble is that it’s pretty coupled to our somewhat uncommon\ntoolchain: we use Mercurial for revision control; instead of\nsubmitting pull requests to Github, we submit “features” to\nIron, a code\nreview system that we wrote in OCaml; and to tie it all together, we\nuse a UI called “Feature Explorer” that we built on top of Emacs.\n\nThe goal of this post, then, is just to show off what the result\nlooks like – to show you what kind of workflow is possible – in the\nhope that it inspires similar tools in other ecosystems. The truth is\nthat after writing code at work this way for a while, you start to\nwish that you had something similar at home. But there’s nothing quite\nlike it for Git and Github and editors like Vim or Textmate or VS\nCode.\n\nWhat Git and Github could look like with truly deep editor integration\n\nWhen you walk around the office here, there’s one window you’ll see\nopen on every developer’s screen:\n\n\n\nThis is your todo in Feature Explorer. At the very top are the\nfeatures you’ve been assigned to review, and below that are the\nfeatures you own.\n\n(Note that we call everything a “feature,” but really they come in two\nflavors: features that are ready for review are the equivalent of pull\nrequests, and features that you’re just privately hacking away on are\nthe equivalent of branches.)\n\nIt looks pretty dull but even here there’s actually a lot of\npower. Notice how some of the features are indented a bit? That’s\nbecause they belong to different repositories. To create a new repo,\nor a new branch within an existing repo, you need only move your\ncursor to the appropriate line and hit !c. Navigating across repos\nis as cheap as navigating across branches.\n\nWhen you drill down into a feature, you see a page just like the one\nyou see on Github for a pull request:\n\n\n\n\n\n(What a Github pull request would look like if it were a Feature\nExplorer feature.)\n\nOf course the difference is that here, that “page” is just a buffer in\nyour editor. Which means you can hit Enter and, instead of getting a\nlist of inert diffs in your web browser, you get diffs that can take\nyou, in a single keystroke, to the very lines that were changed –\nloaded right there in your editor.\n\n\n\nWhat’s more, the file you’re looking at isn’t some global copy – it’s\nthe file for that branch. You didn’t have to think about it but you\nare already in the feature’s sandbox. Any changes you make are\nrelative to that feature’s tip revision, and won’t affect any other\nfeatures.\n\nIt’s worth dwelling on that for a second. All you did was hit Enter a\nfew times, but it’s as if you performed four steps at radically\ndifferent levels, all at once:\n\n\n  \n    You used a Github-esque code review tool to get an overall view\nof what the feature (the pull request) does, including the files it\nchanges.\n  \n  \n    You used a Git-esque revision control tool to check out (git\ncheckout) the specific repo and branch for that pull request.\n  \n  \n    You used a file system to switch into the relevant directory\nwhere that repo & branch live (cd ...) so that you can compile,\nrun tests, manage revision history, etc., within a sandbox.\n  \n  \n    You used an editor to load the file you care about into a buffer\nwhere you can actually make changes, use code completion, and so\non.\n  \n\n\nThese levels are blended so seamlessly that the distinctions between\nthem fall away. When you navigate from feature to feature, you don’t\nhave to worry about where you are in your file system, or what revision\nyou’re on. All you’re thinking about is the code you’re looking\nat. You get used to thinking of the world not in terms of repositories\nand branches and directories, but in terms of features, i.e., units of\nfunctionality.\n\nThis makes it almost trivial to switch contexts. Let’s say I want to\nbring up a coworker’s pull request on a completely different repo. I\nsplit my Emacs window, navigate to the global todo, cursor over to her\nfeature, Enter, Enter, and now I’ve got a diff, or the underlying file\nitself, loaded in my buffer. I can make changes to the file and push\nthem, I can run the build against my coworker’s changes, etc., all\nwithout affecting anything about my feature; everything I touch is\nsandboxed to the world of her pull request. When I’m done, I close the\nwindow and get back to what I was doing. I literally never had to\nleave my editor.\n\nYou might think, hey, Atom has a\nslick new Github integration! Isn’t that\nthe same thing?\n\nAtom’s Github integration, as much an improvement as it is, is not a\nwhole lot more than a browser window that happens to be inside your\neditor. Navigating to a pull request inside of the Github window\ndoesn’t change “where you are” on disk; you can’t just navigate from\nrepo to repo and, right there in your editor, pull up a file in the\nexact state it’s in in that pull request.\n\nOnce you can do that, you’ll wish the functionality existed\neverywhere people write code.\n\nThe lifecycle of a pull request in Feature Explorer\n\nLet’s go back to the main todo and create a new feature. This is like\ncreating a new branch in git and a pull request in Github at the same\ntime. Just hit !c and give the feature a name, specifying which repo\nit’s a part of.\n\n\n\nWhen you hit Enter, already Iron has created a “workspace” for the new\nfeature. Under the hood, workspaces are managed using a Mercurial\nextension called ShareExtension; the git equivalent would be to have\nmultiple clones of the same repo on the same disk.\n\nThe way it works at Jane Street is that each user has a ~/workspaces\ndirectory on their disk. In ~/workspaces/REPO/+clone+/ there’s a full\nclone of the repo; then you do your work in\n~/workspaces/REPO/BRANCH1/+share+ and\n~/workspaces/REPO/BRANCH2/+share+/ and so on.\n\nThose +share+ directories have very small /.hg directories compared to\nthe main +clone+ one, because the versions in the +share+ directories\nare mostly made up of pointers to the +clone+ directories.\n\nSince each workspace is a literal directory on disk, just by cd‘ing\nyou can go from one to the other and get a totally independent\nworkspace where you can create files, run the build, and so on.\n\nBack in the main todo, you’ll see the feature you just created in the\nlist of features you own.\n\n\n\nThere’s a “next step” column because a feature can be in a bunch of\ndifferent states:\n\n\n  \n    It might be currently under review. In this case, you’ll see how\nmany reviewers are left.\n  \n  \n    It might be ready to release. All your code has been reviewed and\napproved: if you have the right permissions, just hit !rl to\n“release” (i.e., merge the pull into master).\n  \n  \n    It might be brand new, waiting for you to add code or enable\nreview. A feature where review hasn’t been enabled is just like a\nbranch you’re privately working on, rather than being a pull\nrequest. It isn’t visible to other people. Get in there and start\nhacking.\n  \n\n\nFeatures that are red have a failing build, the yellow\nones are under review, the green ones are ready to release, and the\nwhite ones are works in progress.\n\nTo add files to your newly created pull request, just nav to its main\npage and use your editor’s keybindings to create a new file.\n\nWhen you do that, you’ll notice that your root directory has changed\nto ~/workspaces/REPO/BRANCH/+share+/ because you’re in the sandbox.\n\n\n\nTo “submit a pull request,” you need only “enable review,” i.e., hit\n!e on the feature’s main page – suddenly, the feature will be\npublicly visible and will show up in other people’s todos. What’s neat\nabout this workflow is that it’s just as easy to disable review, say\nif someone’s comments in code review led to a major rethink of how to\napproach the feature. You can go quiet, hack hack hack, and re-enable\nonce you’re ready to show your code to the world again.\n\nCode Review\n\nWe put code review at the top of your todo and color it yellow as a\nbit of a nudge. The idea is that someone else’s code that’s almost\nfully baked and almost ready to be released is basically a\nhair-on-fire priority compared to code that you’ve just started\nwriting. It’s worth interrupting you for.\n\nBut we want to get you in and out of code review as quickly as\npossible. Here’s how it works. In Feature Explorer, doing review is a\nmatter of going to the global todo, hitting Enter on one of the\nfeatures assigned to you, and pressing r. That’ll bring up a screen\nlike this:\n\n\n\nEach of those lines represents a patch for your review. Hit Enter\nagain to read it:\n\n\n\nThere’s actually a lot of intelligence behind these screens – our\ncode review system, Iron,\nuses a sophisticated algebra\nto calculate which diffs to show a reviewer based on what they’ve\nalready reviewed, what merges and rebases there’ve been in the\nmeantime, and the like – but the point to emphasize here is just that\nit’s all right there in your editor.\n\nIf you like what you see in a given patch, just hit !r to approve\nit. If you don’t understand it, or want to leave a comment, you can\nopen the file by pressing e. The file will be opened in your editor\nin a feature-specific sandbox. Feature Explorer will even put your\ncursor on exactly the line affected by the patch.\n\nAt Jane Street, reviewers leave notes as literal comments in the\ncode. They’re distinguished by the special designation “CR” at the\nbeginning of the comment. You can assign CRs, resolve them, etc., all\njust by editing the comment’s text using a special (but simple)\nsyntax.\n\n\n\n\n\n(Code review conversations take place in the code itself using\nspecial “CR” comments. Pending CRs show up in the assigned user’s\ntodo.)\n\nThis system might seem primitive compared to Github’s in-browser\ncommenting UI, but it means that you can do code review entirely\nwithin your editor. And since Feature Explorer is so tightly\nintegrated with the code review system, it’s trivial to bring other\npeople into the mix: if you assign someone a CR, the feature it’s a\npart of will immediately show up in their todo.\n\nGo fork and prosper\n\nThere’s nothing special about Mercurial, Iron, or even Emacs that\nmakes this system possible. Any extensible editor – like Vim, Atom,\nVS Code, Textmate – could support the core UI. Deep integrations for\ngit already exist in most of these editors, and the Github API should\nbe powerful enough to support an Iron-like code review system.\n\nThe highest-impact parts are in some ways the simplest, for example,\nthe part that lets you navigate from a Github pull request down to a\nfile in a specific branch on your local disk. If you’ve already got a\nclone of a repo, branching is cheap; creating sandboxed directories\nis cheap; and with a good enough Internet connection, listing pull\nrequests and the files they affect is cheap. (One of the things we\nfound with our system is that it didn’t really click until we had a\nsub-100ms code review server.) The rest is mostly editor glue.\n\nOur system works well for us, but as sometimes happens with a piece of\nproprietary software built over a long time, it really only works\nfor us. One hopes the idea is intriguing enough to encourage\ndevelopment elsewhere.\n",
        "url"      : "https://blog.janestreet.com/putting-the-i-back-in-ide-towards-a-github-explorer/",
        "image"    : "https://blog.janestreet.com/putting-the-i-back-in-ide-towards-a-github-explorer/postimage.jpg",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Learn OCaml in NYC",
        "date"     : "February 16, 2018",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 2,
        "content"  : "Interested in learning OCaml? In the NYC area? Then this might\nbe for you!\n\nJane Street is running a day-long workshop on March 24th to teach\nOCaml. Our goal is to get people comfortable with the basics of the\nlanguage and show them how to work effectively with the latest and\ngreatest tools.\n\nFrequently Anticipated Questions\n\nWhat kinds of things do you expect to cover?\n\nThe workshop is organized around a set of exercises that take you\nthrough the basics of the language. In addition to that, we’ll teach\nyou how to use the latest tools for OCaml, including\nDune for building your code,\nMerlin for IDE-like support (type\nthrowback, go-to-definition),\nexpect tests for testing your program\nand visualizing its output, and\njs_of_ocaml for developing for the\nbrowser.\n\nWho is this for?\n\nThe workshop is aimed at programmers who don’t know OCaml, i.e.,\npeople who are already comfortable and effective programmers in their\nlanguage of choice, but happen not to know OCaml, and probably don’t\nknow any language like it.  We expect some people will come in with\nexperience with other functional languages, and some with no\nfunctional experience at all. That’s all fine.\n\nWhere do I register?\n\nThe registration link is\nhere, and registration\ncloses March 16. Space is limited, so sign-up early.\n\nHow much does it cost?\n\nNothing!\n\nWhen is it?\n\nThe workshop is on March 24th, from 10AM to 4PM. We’ll serve lunch and\nhave snacks during the day, but you’re on your own for breakfast and\ndinner.\n\nWhat do I need to do in advance?\n\nWe’re planning on putting up some instructions on how to get your\nplatform set up in advance, so there’s a minimum of fuss on the\nday. We’ll have enough people to help with some installation problems\nduring the workshop, but you and everyone else will have a better\nexperience if you do most of the installation nonsense in advance.\nWe’ll have a mailing list where you can ask questions about that kind\nof stuff if you get stuck.\n\nCan I come if I already know OCaml?\n\nThe basic workshop material is more suited to OCaml beginners, but if\nenough people with OCaml experience register, we’ll do an advanced\ntrack where we cover some other topics, like\nthe\nIncremental\nlibrary for efficient on-line algorithms. So if you’re interested and\ndo know OCaml already, please do apply. (Also, there are some topics,\nlike Dune and expect tests, that are worth learning even if you\nalready know OCaml.)\n",
        "url"      : "https://blog.janestreet.com/learn-ocaml-nyc/",
        "image"    : "https://blog.janestreet.com/learn-ocaml-nyc/ocaml_workshop.jpg",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Proofs (and Refutations) using Z3",
        "date"     : "February 15, 2018",
        "authorId" : "xclerc",
        "author"   : "Xavier Clerc",
        "tags"     : [],
        "minsToRead" : 11,
        "content"  : "People often think of formal methods and theorem provers as forbidding\ntools, cool in theory but with a steep learning curve that makes them\nhard to use in real life. In this post, we’re going to describe a case\nwe ran into recently where we were able to leverage theorem proving\ntechnology, Z3 in particular, to validate some real world engineering\nwe were doing on the OCaml compiler. This post is aimed at readers\ninterested in compilers, but assumes no familiarity with actual\ncompiler development.\n\nWe’ll start by discussing the kind of compiler optimizations we’re\ninterested in and why they might be difficult to get right, in\nparticular when applied to floating-point computations.  We’ll then\nshow how we used Z3 to review some optimizations we were considering\nadding to the OCaml compiler, and finding a subtle bug in one of them.\n\nConstant Folding\n\nCompilers usually perform a lot of optimizations to ensure that the\nprograms written by developers run as fast as possible. Among these\noptimizations, a common one is\nconstant folding,\nwhich allows the compiler to rewrite an expression such as 2 * 3 to\n6, performing the computation at compile time rather than at run\ntime. Of course, developers don’t usually write expression like\n 2 * 3, but such examples show up anyway, because other compiler\noptimizations can create them. e.g., by replacing a symbolic constant\nwith its actual value. Similarly, constant folding can in turn trigger\nother optimizations, like determining that a code fragment is\nunreachable, which leads to it being deleted.\n\nA natural question is, of course, the applicability of such\noptimizations. In the previous example, a multiplication can be\ncomputed at compile time not only if we know the values of both\noperands, but also by applying basic mathematical knowledge such as:\n\n\n  the presence of a neutral element: 1 * expr can be rewritten\nto expr;\n  the presence of an absorbing element: 0 * expr can be\nrewritten to 0;\n  the equivalence with another operation: -1 * expr can be\nrewritten to -expr (the latter being both shorter and faster on\nmost CPUs).\n\n\nA common pitfall for compiler developers is to apply mathematical\nknowledge to objects having slightly different properties. Indeed,\nwhile the rewrites above are trivially correct for mathematical\nintegers, one has to keep in mind that what we call integers on\ncomputers is typically quite different from what we call integers in\nmathematics. For instance, in most programming languages, integer\nvalues are represented using a fixed number of bits which leads to a\nbounded range of possible integers.\n\nAs an example, if a signed integer is represented using 8 bits, then the range\nof possible values is usually -128..127. In this setting, is it correct to\nrewrite -1 * expr to -expr? It turns out that it is, but only because the\nusual way overflow is handled (at least when integer values are encoded using\nthe two’s complement\nrepresentation) is such that -(-128) = -128. The mathematical intuition was\ncorrect, but for not-so-obvious reasons. Floating-point numbers are\nunfortunately even trickier to manipulate.\n\nFloating-point Computations\n\nFloating-point numbers\nare an approximate representation of real numbers. Floating-point computations\nare notoriously error-prone, mostly because some basic mathematical properties\nover real numbers do not hold for floating-point values. For instance,\nfloating-point addition is not associative i.e. given three floating-point\nvalues x, y and z, the following computations do not always yield the same\nresult:\n\n\n  x + (y + z);\n  (x + y) + z.\n\n\nTo complicate things even further, some floating-point subtleties have no direct\nequivalent in mathematics, such as the fact that there are two different values\nfor zero: a positive one and a negative one. The values will be considered equal\nwhen compared but may yield different results when used in computations. For\ninstance, 1.0 / +0.0 is equal to positive infinity while 1.0 / -0.0 is equal\nto negative infinity. (Which, incidentally, means it is not safe to replace a\nfloating-point value with another one that is equal according to floating-point\ncomparison.)\n\nCompiler optimizations are expected to transform programs while retaining the\nsame behavior. As a consequence, a number of compilers either do not perform any\noptimization on floating-point computations, or provide means to specifically\ndisable such optimizations. The solution we discuss in this post is the\nrestriction to optimizations that always rewrite to computations producing the\nsame result.\n\nWhile the previous sentence seems to be perfectly reasonable, it is\nmissing practical details: what do we mean by “always” and “same”? The\nlatter is easy to define: we regard two floating-point values as\nidentical if and only if they have the very same bit pattern according\nto the\nIEEE 754\nspecification (this ensures that they could not be discriminated by\nany operation). The former is considerably trickier to enforce\nbecause all possible combinations of values have to be considered, and\nideally all possible rounding modes. A rounding mode determines which\nvalue should be chosen when the exact result of an operation cannot be\nrepresented (because floating-point values are only approximations).\n\nAutomated Theorem Proving to the Rescue\n\nThe traditional way to demonstrate that a property is always true is to write a\nmathematical proof. However, writing a formal proof is hard work, so we are\ngoing to take an alternate approach, which is to use a tool in order to automate\nthe process of doing the proof.\n\nA SAT solver is a tool able to solve a\nboolean SATisfiability problem,\ni.e. to determine whether, given a logical formula such as\na ∧ (b ∨ c) ∧ ¬ c ∧ ¬ (b ∧ d),\nthere exists a mapping from the formula variables to truth values that evaluates\nthe formula to true.\n\nSMT stands for\nSatisfiability Modulo Theories,\nwhere the theories act as a built-in knowledge about various kinds of\nstructures such as integers or arrays. Thus, accepting formulas such as\n(a &gt; 10) ∧ (b ∨ (c = 2)) ∧ ¬ (b ∧ (d[c] &lt; 0)).\nThe built-in knowledge of SMT solvers makes them much more efficient than SAT\nsolvers because in a number of situations they can leverage properties from a\ngiven theory in order to reduce the search space.\n\nThe basic use of an SMT solver consists of declaring various entities\nand expressing constraints over those entities before asking the\nsolver whether the constraints can be satisfied (i.e. can hold\nsimultaneously). If the constraints can be satisfied, the solver can\nusually produce a model, that is a mapping from entities to values\nthat satisfy the constraints.\n\nZ3 is an SMT solver that comes with support\nfor both integer values (through bit vectors of custom sizes) and floating-point\nvalues (using the IEEE 754 representation). Assuming we want to check whether\nthe positive zero is a right neutral element for addition, we would submit the\nfollowing query to Z3:\n\n(set-logic QF_FP)\n\n(declare-const x (_ FloatingPoint 11 53))\n(define-fun z () (_ FloatingPoint 11 53) (_ +zero 11 53))\n(declare-const r RoundingMode)\n\n(assert (not (= (fp.add r x z) x)))\n\n(check-sat)\n(get-model)\n\n\n\nBesides the first line that sets the theory to be used, the file is relatively\nstraightforward:\n\n\n  the declare-const/define-fun lines declare the various entities to be\nused later on: an unknown floating-point value, an alias for the positive\nzero, and an unknown rounding mode (the values 11 and 53 are simply\nthe numbers of bits used for respectively the exponent and the mantissa\nof a floating-point value);\n  the assert line encodes the formula we want to check i.e. x + +0 = x;\n  the last two lines ask Z3 to check the satisfiability of the formula and\nto output the model if any.\n\n\nIt is noteworthy that we actually encode the negation of the property we want to\ncheck. As a consequence, satisfiability means that the property does not hold,\nand the model is actually a counterexample. If we get a counterexample, then we\ncan double-check that Z3 is right. Otherwise, we basically have to trust both Z3\nitself and our use of the tool.\n\nZ3 answers the previous query with the following:\n\nsat\n(model\n  (define-fun x () (_ FloatingPoint 11 53)\n    (_ -zero 11 53))\n  (define-fun r () RoundingMode\n   roundNearestTiesToEven)\n)\n\n\n\nOn the first line, sat means that the assertion is satisfied. The other lines\nsimply present a model (i.e. a counterexample for us) where x is bound to\nnegative zero. The suspicious Z3 user can then feed the values to an OCaml\ntoplevel:\n\nlet x = -0.\nand z = +0.\nand bits = Int64.bits_of_float in\nbits (x +. z), bits x;;\n\n\n\nwhose output indeed shows two different values (0 and\n-9223372036854775808).  We should hence not rewrite x + +0 to x.\n\nShould Z3 have answered that there is no model for a given request, we would\nhave been able to ask it to produce a proof. However, the proof establishing\nthat there is no model for (assert (not (= (fp.mul r x one) x))) (i.e. that\nthe x * 1 ⇝ x rewrite is correct) has more than 100,000 lines. This\nmeans that, in practice, such a proof would only be passed to another tool and\nthus only move the trust issue to another piece of code…\n\nUpcoming flambda 2.0\n\nThe previous example has not been chosen at random. The x + +0 ⇝ x\nrewrite was for a (very) short time part of the development version of the\ncompiler, and then rapidly removed once Z3 proved its invalidity. Possibly\nsaving us a painful debugging session later on.\n\nUsing Z3, we have checked that all the simplifications over integer and\nfloating-point values (involving arithmetic, bitwise and shifting operators) in\nthe upcoming flambda 2.0 variant of the OCaml compiler are correct. More, these\nsimplifications are correct for all integer sizes (OCaml defining 4 kinds of\nintegers) and all rounding modes (OCaml offers no means to tweak the rounding\nmode, but it could be modified through an external function call).\n\nEventually, we plan to use a ppx extension combined to a simple DSL in order to\nbe able to express the properties underlying the various rewrites as mere\nattributes of the OCaml code actually performing the rewrites (hence ensuring\nthat the checks and the code are kept synchronized). The simplifications for\nfloating-point multiplication when only one operand is known would look like:\n\n...\n| Mul -&gt;\n  if F.equal this_side F.one then\n    The_other_side\n    [@z3 check_float_binary_neutral `Mul (+1.0) `Right]\n    [@z3 check_float_binary_neutral `Mul (+1.0) `Left]\n  else if F.equal this_side F.minus_one then\n    Float_negation_of_the_other_side\n    [@z3 check_float_binary_opposite `Mul (-1.0) `Left]\n    [@z3 check_float_binary_opposite `Mul (-1.0) `Right]\n  else\n    Cannot_simplify\n...\n\n\n\nRelying on a tool such as Z3 to check whether candidate rewrites are correct is\ninvaluable. It does not only ensure that we do not introduce (possibly subtle\nand hard-to-reproduce) bugs, but also paves the way to more complex rewrites.\nIndeed, the safety net acts as an encouragement to experiment with ideas that\ncould be deemed too risky otherwise.\n\n \nFurther reading\n\nVerifying Optimizations using SMT Solvers\n(by Nuno Lopes) shows how Z3 can be used to ensure that transforms from LLVM IR\nto assembly code are correct.\n\nWhat Every Computer Scientist Should Know About Floating-Point Arithmetic\n(by David Goldberg) is an in-depth lecture about floating-point representation\nand operations, addressing a number of common misconceptions.\n",
        "url"      : "https://blog.janestreet.com/proofs-and-refutations-using-z3/",
        "image"    : "https://blog.janestreet.com/proofs-and-refutations-using-z3/proof.jpg",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Work on the OCaml compiler at Jane Street!",
        "date"     : "December 20, 2017",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 0,
        "content"  : "As Jane Street grows, the quality of the development tools we use\nmatters more and more.  We increasingly work on the OCaml compiler\nitself: adding useful language features, fine-tuning the type system\nand improving the performance of the generated code. Alongside this,\nwe also work on the surrounding toolchain, developing new tools for\nprofiling, debugging, documentation and build automation.\n\nWe’re looking to hire a developer with experience working on compilers\nto join us. That experience might be from working on a production\ncompiler in industry or from working on research compilers in an\nacademic setting. No previous experience with OCaml or functional\nprogramming languages is required.\n\nWe’re looking for candidates for both our London and New York offices,\nand you can apply\nhere.\n\n",
        "url"      : "https://blog.janestreet.com/work-on-the-ocaml-compiler-at-jane-street/",
        "image"    : "https://blog.janestreet.com/work-on-the-ocaml-compiler-at-jane-street/compiler3d.jpg",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Does batch size matter?",
        "date"     : "October 31, 2017",
        "authorId" : "chardin",
        "author"   : "Chris Hardin",
        "tags"     : [],
        "minsToRead" : 12,
        "content"  : "This post is aimed at readers who are already familiar with\nstochastic gradient descent\n(SGD) and terms like “batch size”.  For an introduction to these\nideas, I recommend Goodfellow et al.’s\nDeep Learning, in particular the\nintroduction and, for more about SGD, Chapter 8.  The relevance of SGD\nis that it has made it feasible to work with much more complex models\nthan was formerly possible.\n\nThere is a lot of talk about batch size in stochastic gradient descent\n(SGD).  I will argue in this post that batch size mostly doesn’t\nmatter (as long as it’s not too large), but it can seem to matter when\nyou express SGD in terms of gradients averaged over batches rather\nthan summed.  I personally think that expressing SGD in terms of\ngradients averaged over batches is an unfortunate historical accident\nthat leads to unnecessary confusion about the role of batch size.\n\nA recent preprint by\nSamuel Smith and Quoc Le\n(2017) lends some empirical support to the notion that batch size\ndoesn’t matter (though the authors frame it very differently!).  They\nalso propose rules for how batch size is related to learning rate,\ntraining set size, and momentum, though essentially the same rules already appear in\nMandt et al. (2017).  Smith\nand Le treat SGD as a stochastic differential equation (SDE), as do\nMandt et al., who also cite Kushner and Yin (2003) and Ljung et\nal. (2012) as earlier examples.  The empirical evidence that\nSmith and Le’s work gives for the rules proposed by Mandt et\nal. is valuable.\n\nAnyway, on to the batch size bashing.\n\nTwo ways to write vanilla SGD\nConsider vanilla SGD with batch size B.  If we are estimating\nθ, each step can be written as\n\n\n&theta;new = &theta;old&nbsp;&minus;&nbsp;&epsilon;&nbsp;gavg\n\n\nwhere gavg is gradient averaged over the batch and\nε is the learning rate.  One could alternatively write\n\n\n&theta;new = &theta;old&nbsp;&minus;&nbsp;&tau;&nbsp;gsum\n\n\nwhere gsum is gradient summed over the batch; the term\n“learning rate” is already taken, so let’s call τ\ntemperature (for reasons which will soon become evident).  The above\ntwo ways of writing SGD are equivalent when\nε/B = τ.\n\nI claim that τ matters and batch size mostly doesn’t\n(provided it isn’t too large).  If you fix a particular\nε, however, and then vary B, then B will\nseem to matter a lot—but this will be mostly because by fixing\nε you have tied τ and B together,\nand temperature τ does matter.\n\nConsider Figure 5a from Smith and Le, where test accuracy is plotted as a\nfunction of B for several different values of ε, which by eyeballing\nI have crudely reproduced here as the graph on the left.\nBecause ε has a fixed value on each curve, the value of\nB appears to matter.  If you take that data and present it as a function of τ\nfor several values of B, you get the graph on the right:\n\n\n\nI have conveniently dropped the points where ε = 10—those\nwere points where B is too large, and batch size definitely does\nmatter (in a bad way) when it is too large.\n\nThis suggests that test performance is a function of temperature and\nthat B doesn’t matter (again, provided it isn’t too large).  This is\na stronger statement than claiming the optimal B is linear in\nε.  If the peaks in the left graph had different heights, we\nwould still have the same scaling rule without test performance being\na function of just temperature.  The fact that the peaks have the same\nheight is telling us something beyond the scaling rule.\n\nIn high-level terms: If you are numerically solving an SDE, then once\nyou get below a certain step size, the noise dominates the\ndiscretization error, so further reductions in step size don’t have\nmuch effect on the distribution you get; here, B is playing the role\nof step size, so this is saying B doesn’t matter much once it’s\nsmall enough that you’re doing a decent job of solving this SDE.  On\nthe other hand, if you insert a factor of 1/B into this SDE (which\nis what you’re doing when you use gradient averaged over a batch\nrather than summed) then changing B is changing the SDE which will\nhave a big effect on the distribution you get. But it’s not because of\nstep size, it’s the factor of 1/B in the SDE that really explains\nthe difference.\n\nLikewise, since τ matters and B doesn’t, one should expect\nthere to be a single optimal τ without regard to B.\nSince τ = ε/B, the proportionality rule\nBopt ∝ ε illustrated in Figure 5b\nof Smith and Le is a consequence of this.\n\nThere could very well be interesting 2nd-order effects in the choice\nof batch size, but they are obscured by this 1st-order effect you get\nwhen you fix an ε and turn B into a proxy for\nτ.  More importantly, there could be separate reasons to want\na particular batch size, perhaps for performance, or for\ntuning batch normalization.  By thinking in terms of τ\nrather than ε, you are more free to choose B according\nto these other criteria.\n\nSGD as a sampler rather than optimizer\nNow consider how these parameters should vary depending on training\nset size N.\nSmith and Le, and earlier work such as Mandt et al. (2017), propose\nthe rule B ∝ εN.  This can be equivalently stated\nas τ ∝ 1/N, and I think this latter formulation makes\nmore clear that batch size isn’t a major consideration.\n\nI prefer to arrive at τ ∝ 1/N more via Mandt et al.’s approach,\nwhich views SGD as a sampler rather than an optimizer.  To motivate this\npoint of view, suppose you could choose between the following\n(the silly numbering will be justified later):\n\n\nYou observe N data points, and use the max-likelihood &theta; (or\nperhaps\nMAP\n&theta;) to make predictions.\n\nYou observe N data points, and draw a single &theta;\nfrom your posterior, which you use to make predictions.\n\n\nIf you are in a context where\nasymptotic normality of MLE\napplies, you should favor (0).  But if your posterior is some hairy\nhigh-dimensional thing, crisscrossed with deep but narrow ravines, (0)\nis awful1 and (1) is decent to very good.  (Ideally you’d\nintegrate over your posterior, and (1) is just an approximation of\nthis that integrates for 1 point.  That such a 1-point integral can do\na good job might be counterintuitive, and I might say more about this\nin a future post.)\n\nSo, I think it is better to treat the aim of SGD as being an\napproximation of (1) rather than an approximation of (0).  This is the\ncontext of Mandt et al., though I frame it here a little differently\nthan they do—in particular, they write things in terms of\nε rather than τ, which unnecessarily pulls batch size into\nthe picture.  I think the perspective of treating SGD as sampling from\na distribution is a more compelling way to understand SGD’s\ngeneralization properties than to think of it in terms of preferring\ncertain kinds of minima over other kinds of minima.  (Among other\nthings, if you’re not decreasing learning rate to 0, you’re not\nfinding any minima.)\n\nFor a given choice of temperature τ, the SGD process will, in the\nlong run, approximately follow some Boltzmann distribution\nexp(-E(θ)/τ).  What we would like is for this to resemble\nour posterior, which looks like exp(-L(θ)/(1/N)) where\nL(θ) is negative log-likelihood, averaged over the dataset.2\nThis suggests the scaling rule τ ∝ 1/N.  (I don’t think\nthere’s anything too deep here: if you double the number of data\npoints, the influence of any one point should get cut in half.)\n\nSubstituting τ = ε/B, this becomes the rule\nε ∝ B/N which Mandt et al. propose on page 9.\nSmith and Le write this in the equivalent form B ∝ εN.\n\nBack to the choice presented above: what if you were in a case\nwhere (0) is better, or you wanted something in between?  Observe\nthat (0) and (1) are special cases of the same more general rule; if\nyour posterior is exp(-ℓ(θ)), then they are both sampling\nfrom exp(-ℓ(θ)/T): (0) is the limiting case as\nT→0, and (1) is the case T=1.  So maybe there’s some fudge\nfactor that should be included in temperature to get the right balance\nbetween (0) and (1)—i.e., you’d like to integrate over your\nposterior, but once you’re told your integral only gets to have a\nsingle point, you might want to “sharpen” your posterior before the\npoint is drawn.  But this T is absorbed as a multiplicative factor in\nτ, so if you are tuning τ based on validation performance, you\nget tuning of T for free.\n\nBurn-in\nWe have glossed over the issue of burn-in by talking in terms of what\ntemperature you should want in order to get good long-term behavior\nout of SGD.  For very large data sets, if you use this temperature\nthroughout training, you are not going to be running long enough for\nyour process to resemble a stationary process.\n\nIn particular, while we expect the optimal long-term temperature to\nobey the scaling rule τ ∝ 1/N, the\noptimal initial temperature probably should not vary much with training set\nsize N.  (This is consistent with the general consensus that you\nshould start with a high learning rate that is decreased during\ntraining.)  Temperature is how\nmuch you change your mind about θ when shown another data\npoint.  With twice as much data, in the long run you should be\nchanging your mind half as much on each point—that’s the\nτ ∝ 1/N rule.  But in terms of getting to a reasonable belief\nearly in training, that first point should\nchange your mind the same amount regardless of how many more points\nyou’re going to get to see.  A different way to put this: You probably\nwant to crank up the temperature at first for the sake of mixing even\nif that higher temperature would yield the wrong long-term behavior if\nyou maintained it.3\n\nNote that lower temperatures are going to tolerate larger batch sizes\n(in terms of SDEs, the slower your state is changing, the larger the\ntime steps you can get away with while still doing a reasonable job of\nsolving the SDE), so a large batch size that might be acceptable at\nthe low temperatures occurring late in training might be too large\nduring early training at a higher temperature, and you might want to\nuse smaller batch sizes during early high-temperature training.\n\nMomentum\nThe same “batch size doesn’t really matter” ideas apply if you use SGD\nwith momentum, provided the momentum decay is stated in\nterms of “decay per point” rather than “decay per batch”, and velocity\nis also expressed in per-point rather than per-batch terms.  For\nalgorithms like RMSProp and Adam, there is the further complication\nthat gradients are divided by some running estimate of standard\ndeviation of gradient; here, I’m guessing you can still get\n(approximate) indifference to batch size, but you must state the\nalgorithms in terms of estimating what the standard deviations of the\nper-point gradients are.  (That doesn’t require any changes to the libraries\nyou’re using—you just work with one parametrization in your own code\nand translate it to the convention of the code you’re calling.)\n\nThere’s also the question of how to make temperature maintain roughly\nthe same meaning across different values for the momentum damping\nparameter μ (think of 1−μ as decay per point).  One must\nwrite the velocity updates as\nvnew =(1−μ)B vold − τμgsum.\n(The factor of μ when gradient is added to velocity corrects for\nthe fact that, once present, it will affect θ for many steps.)\nOne should still expect τ ∝ 1/N.  Relating to\nε, B, and m (momentum decay per batch) one has\nm = (1−μ)B and\nτ = ε/((1−m)B) which\nyields the rule\nB ∝ εN/(1−m) that\nessentially appears in Smith and Le and Mandt et al. (once notation is\naccounted for).\n\nI don’t mean to suggest that this makes momentum somehow inconsequential;\nin particular, momentum can affect how quickly you mix.\n\nReferences\nIan Goodfellow, Yoshua Bengio, and Aaron Courville.  Deep Learning.\nMIT Press, 2016.  Available online.\n\nHarold J. Kushner and George Yin.  Stochastic approximation and recursive algorithms\nand applications, volume 35.  Springer Science & Business Media, 2003.\n\nLennart Ljung, Georg Ch. Pflug, and Harro Walk.  Stochastic approximation\nand optimization of random systems, volume 17.  Birkhäuser, 2012.\n\nStephan Mandt, Matthew D. Hoffman, and David M. Blei.  Stochastic gradient descent\nas approximate Bayesian inference.\narXiv preprint arXiv:1704.04289, 2017.\n\nSamuel L. Smith and Quoc V. Le.  A Bayesian perspective on\ngeneralization and stochastic gradient descent.\narXiv preprint arXiv:1710.06451, 2017.\n\n\n\n\n  \n    The awfulness of (0) in the hairy high-dimensional case is often\nseen in practice, but does it exist in principle?  If you know you’re\ngoing to be using a MAP estimate rather than integrating over your\nposterior, this can influence the design of the thing you call your\nprior in order to make (0) work better.  For more information, consult\nan MDL\nadherent.\n  \n  \n    A separate question is how to make E(θ) resemble\nL(θ), and this is much of what Mandt et\nal. is about.\n  \n  \n    Even with this higher initial temperature, you still might not get\ngood mixing, since the “good” parts of your posterior can be\ndifficult to find regardless of algorithm.  (Write your favorite\nNP-hard problem in terms of a Boltzmann distribution…) This\nis more speculative, but it seems that the temperature that would\nbe optimal if you did reach the stationary distribution will still\nhave good generalization properties even if you never manage to\nfind your way out of some suboptimal region of the model space.\n  \n\n",
        "url"      : "https://blog.janestreet.com/does-batch-size-matter/",
        "image"    : "https://blog.janestreet.com/does-batch-size-matter/batch-01.png",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "How Jane Street Does Code Review (Jane Street Tech Talk)",
        "date"     : "October 29, 2017",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 0,
        "content"  : "It’s time for our next\nJane Street Tech Talk. When\nwe’ve solicited suggestions for topics, one common request has been to\ntalk about our internal development process. Our next talk,\nHow Jane Street Does Code Review,\nshould fit the bill. The talk is being given by our own Ian Henry, and\ndiscusses how we approach code review, and in particular how Iron, the\ncode review system we’ve been using and improving for some years now,\nfits in to that process.\n\nSo join us! The talk will be on Wednesday Nov 15th. You can\nread more about the talk\nor just register.\n",
        "url"      : "https://blog.janestreet.com/jane-street-tech-talk-how-jane-street-does-code-review/",
        "image"    : "https://blog.janestreet.com/jane-street-tech-talk-how-jane-street-does-code-review/image.png",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Jane Street Tech Talk, Verifying Network Data Planes",
        "date"     : "September 26, 2017",
        "authorId" : "seliopoulos",
        "author"   : "Spiros Eliopoulos",
        "tags"     : [],
        "minsToRead" : 1,
        "content"  : "After a summer hiatus, the Jane Street Tech Talks series is back on\nfor the fall! Last we left it, our very own Dominick LoBraico\npresented on the evolution of our internal configuration methodology\nand the systems that support it. For anybody that missed it, you can\ncheck out a recording of the talk on YouTube.\n\nTo kick off the new season, we’ve invited Nate Foster of\nCornell University and Barefoot Networks to brief us on the latest\ndevelopments in the realm of software-defined networking. In\nhis talk, he’ll present the P4 data plane programming\nlanguage, and its accompanying verification tool p4v.\n\nNate, along with a cadre of programming language and systems\nresearchers, have been working on various languages and verification\ntools for software-defined networking, starting with Frenetic,\nfollowing it up with NetKAT, and now P4. I had the pleasure of\nworking with Nate on the Frenetic controller platform and the NetKAT\ncompiler a few years back. What’s remarkable about Nate and his\ncolleagues is they not only consistently publish top-tier research\nresults, but also produce working, usable software artifacts and\nsystems.\n\nSo join us! The talk will take place on Monday, October 16. Click\nhere to register.\n\nFor a full description and speaker bio, check out the talk\npage.\n\n",
        "url"      : "https://blog.janestreet.com/jane-street-tech-talk-verifying-network-data-planes/",
        "image"    : "https://blog.janestreet.com/jane-street-tech-talk-verifying-network-data-planes/tech-talk-nate-foster.png",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Real world machine learning (part 1)",
        "date"     : "August 28, 2017",
        "authorId" : "alebron",
        "author"   : null,
        "tags"     : [],
        "minsToRead" : 5,
        "content"  : "Trading is a competitive business. You need great people and great\ntechnology, of course, but also trading strategies that make money.\nWhere do those strategies come from? In this post we’ll discuss how\nthe interplay of data, math and technology informs how we develop and\nrun strategies.\n\nMachine learning (ML) at Jane Street begins, unsurprisingly, with data. We\ncollect and store around 2.3TB of market data every day. Hidden in those\npetabytes of data are the relationships and statistical regularities\nwhich inform the models inside our strategies. But it’s not just awesome\nmodels. ML work in a production environment like Jane Street’s involves\nmany interconnected pieces:\n\n\n  \n    Get the data: Building the infrastructure to gather, store, index\n  and retrieve this amount of data efficiently and with microsecond-level\n  accuracy is itself an interesting job. We have a whole team dedicated\n  to this important work. If we fail to log data at any point then that\n  data is gone, never to return.\n  \n  \n    Clean the data: The raw data that we receive is frequently missing,\n  corrupted, misaligned, or has other issues. Before we can deploy any\n  modeling techniques, we need to sanitize the data. This is a crucially\n  important part of the process, unavoidable but admittedly tedious.\n  1\n  \n  \n    Explore the data: It’s hard to know what techniques to throw\n  at a problem before we understand what the data looks like, and\n  indeed figure out what data to use. Spending the time to visualize\n  and understand the structure of the problem helps pick the right\n  modeling tools for the job. Plus, pretty plots are catnip to traders\n  and researchers!\n  \n  \n    Leverage domain expertise: The more you know about the problem\n  you are trying to solve, the greater your ability to build good\n  models. This comes up in many ways throughout the process: choice\n  of objective function, reasonable approximations, and the algorithm\n  used to solve it. Image models often have translation invariance, for\n  example, while financial models often have low signal to noise ratios\n  and a lot of game theoretic priors. Expertise like this is hard-won,\n  resulting from many previous successful and unsuccessful efforts.\n  2\n  \n  \n    Build a model: This is the part that gets everyone excited in\n  ML. However, we’ve found that standard techniques almost never work\n  out-of-the-box. The more you understand about what makes an algorithm\n  work or fail, the more likely you are to come up with effective ways\n  to modify it and make it work on the problem at hand. Or come up with\n  something entirely new!\n  \n  \n    Validate the model: There is no shortage of ways to fool yourself\n  when building ML systems, especially in a competitive world like\n  trading. Some of the most exciting parts of the process come when a new\n  ML system shows, with high probability, that it’s better than what\n  we had in the past. That’s how we know we’re making real progress.\n  \n  \n    Deploy the model: There is a lot of interesting work when\n  deploying a new model, work that makes the difference between a cool\n  idea and one that actually makes money. It’s important to run\n  it efficiently and reliably, of course, but also to ensure that a\n  predictive model’s mistakes aren’t catastrophic. What’s more,\n  once you start trading the market will adapt to your strategy, making\n  your model less effective over time. More confusingly, if you’re\n  not careful you may enter a bad feedback loop where the next model you\n  build looks at your own current trades as evidence that “the market\n  agrees that I should trade here”! Issues like these make applying\n  ML to trading a very challenging problem.\n  \n\n\nOver the years we’ve used a variety of ML techniques: Gaussian\nprocesses, random forests, adaptive regression splines, and genetic\nalgorithms among others. Lately our use of deep learning ideas has been\ngrowing. These ideas (such as very-high-parameter models, backprop-based\nstochastic gradient descent, etc) have taken the world by storm in the\nlast 5 years, and rightly so given the exciting results achieved across\na wide variety of domains. Particularly interesting is that, with a few\ncorner-case exceptions, the world doesn’t yet understand why these\ntechniques generalize as well as they do. This makes deep learning\ntechniques exciting to think about, and our work in this area has led\nto some strategies that we currently use in production. Deep learning\nis a large, exciting and occasionally confusing area of ML, and we’re\noptimistic about what we’ll be able to learn and invent in this area.\n\nNevertheless, the world of ML is much larger and richer than deep\nlearning. If there’s anything Kaggle competitions have taught the world,\nit’s that the best solutions combine a variety of approaches in often\nad-hoc and messy ways. We know that the financial world doesn’t present\nclean problems: the human world is complex and ever-changing. That’s\nwhy Jane Street is committed to seeking out, inventing, developing and\nusing the best possible tools in our trading. We believe that if we are\nnot continually pushing on the boundaries of what is technically and\nintellectually possible we will very quickly stop being competitive. The\nexcitement is in chasing new ideas and putting those ideas in action\nin this competitive environment. This is a big part of what makes Jane\nStreet such an interesting place to work.\n\nOver the next few months on this blog, we’ll be examining interesting\nproblems and examples of ML as it pertains to trading. So stay tuned!\n\n\n\n\n  Among the things we’ve seen: missing data, seemingly valid market\ndata appearing when markets are closed, data for the wrong stock,\nfrozen/delayed/intermittent data, etc.\n  An example: modeling the difference in price between an ETF and its\nbasket as a Gaussian would be a mistake, since there are well-defined\narbitrage bounds on this difference.\n\n",
        "url"      : "https://blog.janestreet.com/real-world-machine-learning-part-1/",
        "image"    : "https://blog.janestreet.com/real-world-machine-learning-part-1/inverse_colors.gif",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "How to design a tree diffing algorithm",
        "date"     : "August 25, 2017",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 0,
        "content"  : "For those of you interested in what\nwhat\ninterns\ndo at Jane Street, here’s a\npost from former intern\nTristan Hume, on his work developing tree-diffing algorithms last\nsummer at Jane Street. It’s a fun (and very detailed!) read.\n",
        "url"      : "https://blog.janestreet.com/how-to-design-a-tree-diffing-algorithm/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Ironing out your development style",
        "date"     : "August 24, 2017",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 11,
        "content"  : "People seem to enjoy talking about programming methodologies. They\ngive them cute names, like\neXtreme programming,\nAgile, and\nScrum; run\nconferences and build\ncommunities around them;\nwrite\nbooks\nthat describe how to use them in excruciating detail; and\nmanifestos that lay out their\nphilosophy.\n\nThis approach always leaves me cold.\n\nI prefer stories to manifestos. Big overarching theories of\nprogramming are hard to come by (or at least good ones are) because so\nmuch depends on the details of the technology used, the problem to be\nsolved, and the culture of the organization in question.\n\nInstead, I like to hear people describe the things they’ve tried and\nhow those choices have worked out in practice.  Such stories are hard\nto draw general conclusions from, but hearing them helps to build up\nyour intuition about what the possibilities are.\n\nIn that spirit, I want to tell a story about how we develop\nsoftware. In particular, I wanted to describe a style of development\nthat has gained some traction with us in the last couple of years.\nFor lack of a better name, let’s call this the Iron style of\ndevelopment, since it depends a lot on\nIron,\nour code review and release management tools.\n\nThe Iron style combines the following approaches:\n\n\n  \n    Lots of (expect) tests.\nExpect tests are used pervasively,\nserving as a way of capturing program traces that expose aspects of\nthe behavior of the system to reviewers.\n  \n  \n    Small changes. Individual changes are kept small most of the\ntime, with most changes being somewhere from ten to a few hundred\nlines. Large changes are often created (and reviewed) as chains of\ndependent changes.\n  \n  \n    Fast turnaround. Review is done eagerly, with review taking\nprecedence over writing new code.\n  \n\n\nThis sounds like a fairly mundane list of good practices, and it\nmostly is. But the details are important, and not obvious (or at\nleast, they weren’t obvious to us). The way in which the tools\naffected which workflows we were willing to choose, and the interplay\nbetween the different elements of the this style, was something we\nwere surprised by.\n\nSo, without further ado, let’s talk about the details.\n\nLots of expect tests\n\nExpect tests (also known as\nunified tests) let you\ninterlace your test code with the printed output of those tests.  The\ntest framework is responsible for capturing the output and integrating\nit into your source file. If such integration would lead to the file\nchanging, then the test has failed.\n\nHere’s a small example using our expect test framework to test the\nfunction List.group.\n\nlet%expect_test _ =\n  let test l =\n    let stringify l = List.map ~f:Int.to_string |&gt; String.concat ~sep:\"-\" in\n    List.group l ~break:(fun x y -&gt; y &lt; x)\n    |&gt; List.iter ~f:(fun sub -&gt; print_endline (stringify sub))\n  in\n  test [1;2;3;2;3;3;6;1;2;36;7];\n  [%expect {|\n    1-2-3\n    2-3-3-6\n    1-2-36\n    7\n  |}]\n\n\n\nWe don’t actually need to fill in the expect declaration by hand. If\nit starts empty, then the test runner will generate a corrected file\nwith the output shown above, which we can accept by copying it over\nthe original source file.  If the output changes again at some later\npoint, it’s easy to look at the diff to see if the change should be\naccepted.\n\nExpect tests make it easy to create simple regression tests.  For\nexample, when writing a parser for some data source, we might write a\ntest that consumes and prints out the result of running the parser\nover some sample data.  Similarly, with systems that contain complex\nstate machines, we often write expect tests that sequence a set of\ntransactions, periodically dumping out summaries of the internal state\nof the system.\n\nThis is useful as a way to verify the behavior of new code, but the\nreal value is how it helps reviewers understand changes to existing\ncode. Reading the diffs of these program traces provides a way to\nvisualize the way in which the semantics of your program is changed by\na given feature.\n\nIn this style, you may not write an expect test for every change, but\nyou do expect nearly every semantic change to be reflected in one way\nor another, either via a new test, or via a diff to an old one.\n\nSmall changes\n\nThe other aspect of this approach is that features are kept small,\nmostly by breaking up large changes into chains of smaller features.\nSometimes the initial author will do this from the get-go, and\nsometimes a reviewer helps to break down what starts as a monolithic\nfeature.\n\nThe goal is to express a large change as a sequence of smaller ones\nthat are themselves coherent enough to be read, tested, and often even\nreleased on their own.  This dovetails with expect tests, since the\nexpect tests make the effect of each feature easier to comprehend, and\nthe fact that the semantic changes are small makes the diffs of the\nexpect tests easier to read.\n\nSometimes, the program trace you want for demonstrating the effect of\na change isn’t there yet, in which case you can mint a parent feature\nthat adds the program trace, so you can read the diff in the\nsubstantive feature.\n\nThis is particularly valuable when squashing bugs. Adding an expect\ntest in one feature that demonstrates the buggy behavior, and then\nfixing it in the followup feature, is a good way of convincing\nyourself that the bug really works the way you think it does, and\ndemonstrating that the bugfix resolves the issue.\n\nFast turnaround\n\nAnother aspect of the Iron style is fast review.  That’s easier said\nthan done, of course, but keeping features small and heavily tested\nhelps.  That’s because it takes less mental effort to convince\nyourself that a feature is right if it’s small enough to fit in your\nhead, and there are good tests that demonstrate the behavior.\n\nAt Jane Street, a given change may be reviewed by a decent number of\npeople, particularly if it touches many parts of the codebase.  But\nthere is a special reviewer, called the seconder, whose job is to\nreview the feature in its entirety, and is often a full collaborator\nwith the author on the change in question.\n\nPart of the seconder’s job is to encourage authors to make review\neasier, both by breaking features down into smaller pieces, and by\nadding more tests.  That increases the amount of work that an author\nneeds to do to complete a given code change; but because review needs\nto be done by many people, that effort is generally well spent.\n\nBeyond that, it gives authors more autonomy to get their features\ndone, since they can effectively cut the amount of work they need from\nother people.  And the tests they add in the process have lasting\nvalue, helping prevent future bugs.\n\nThere’s also a nice interplay between small features and fast\nturnaround when it comes to merge conflicts. Small features are less\nlikely to conflict, and when they do, the resolution of the conflict\ntends to be easier to understand. At the same time, small features are\neasier to get out quickly, which reduces the likelihood of conflicts\nyet more.\n\nWhy now?\n\nYou might wonder why this approach emerged now.  After all, Jane\nStreet has been around for more than 15 years, and building systems in\nOCaml for 12 of those years.\n\nI think the answer has to do with tools.  Tools change the constants\nof your development universe, warping the fabric of your day-to-day\nwork by enough that different equilibria become possible.  The Iron\nstyle depends on a collection of different improvements to our tools,\nand if you take just a few of them away, the style starts to fall\napart.\n\nThe major systems that are relevant here are:\n\n\n  Iron\nitself, which is responsible for code review and release\nmanagement.\n  Our\ninline test framework,\nand especially our support for\nexpect tests.\n  Jenga, our in-house build\nsystem.\n\n\nLet’s talk about each of these individually.\n\nIron\n\nIron is responsible for managing the different changes (called\nfeatures) that are being worked on, organizing both code review and\nthe management and merging of these features.  The push towards long\nchains of small features puts a lot of pressure on Iron’s\nperformance. If a change that would have been one feature becomes\nseven, you need to do seven times as many feature operations. As a\nresult, the performance of those operations becomes absolutely\ncritical to the user experience.\n\nTo that end, we’ve done a lot of work to make the system more\nresponsive and lightweight.  For example, Iron keeps a cache of\npre-populated source checkouts to make creating a new feature almost\ninstantaneous.  Iron also uses the fact that it knows what review a\ngiven user has to do, and uses that information to prefetch the\nrelevant features.\n\nWe’ve also done work to simplify working with chains of\nfeatures. When you have seven features instead of one, you really want\nthe ability to release the entire chain as a single action. We’ve\nbuilt automation in Iron to do just that.\n\nThe need to make review fast also means that you need Iron to act as\nan effective communication mechanism. Iron comes with a dashboard that\ntells all the users involved what features they need to work on and\nwhat they’re expected to do next.\n\nInline tests\n\nMany developers are reluctant to write tests, and I think a large part\nof the reason isn’t that writing the tests themselves is painful; it’s\nall the work around setting up the tests that people find\ndisheartening.\n\nFor that reason, we’ve long thought it important to make adding a new\ntest require as little bureaucracy as possible. In our codebase, just\nadding a let%expect_test declaration to the source ensures that the\ntest is registered and will be included in our continuous-integration\ntesting, meaning that no one can release a feature that breaks that\ntest.\n\nFor expect tests in particular, we’ve built pretty good editor side\nsupport, with key bindings for rerunning a test, bringing up the\ncolorized diff showing how a given test failed, and accepting the new\nversion of a test.\n\nThese are small efficiencies, but they add up. The end result is that\nwe do a lot more testing with these tools in place than we did without\nthem.\n\nJenga\n\nJenga has contributed to the Iron style in a number of ways. Probably\nthe single most important thing is compilation speed.  Small features\nare a lot more palatable if switching between features is efficient,\nand a big part of that efficiency comes from Jenga.\n\nJenga is also critical to the automation of tests I mentioned\nabove. Jenga is a key contributor to the low overhead of adding a new\ntest.  And the efficient parallelization that Jenga provides helps\nmakes testing faster too, which makes it possible to add more tests.\n\nSumming up\n\nOne of the things that’s become clear to me over the years is that\ntools are critical to keeping people efficient as an organization\ngrows. By default, everything gets harder as you get bigger; you\ntry to solve tougher problems; as your software grows there are more\ncomplex interactions between different parts of your infrastructure;\nand the organization itself becomes more complex.\n\nOne key tool for maintaining the ability of each individual developer\nto get things done is to invest in sharpening their tools. And if\nyou do it right, that tool sharpening doesn’t just make the things\nthey’re doing now easier. It can open up new and unexpected ways of\nworking.\n",
        "url"      : "https://blog.janestreet.com/ironing-out-your-development-style/",
        "image"    : "https://blog.janestreet.com/ironing-out-your-development-style/story.jpg",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Hiring an FPGA engineer",
        "date"     : "August 16, 2017",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 1,
        "content"  : "Jane Street is looking to hire an engineer with experience in both\nsoftware and hardware design to work on FPGA-based applications, and\non tools for creating such applications.\n\nWe’re big believers in the ability of tools to make programming\nfaster, more pleasant, and more reliable. We think the same is true\nfor hardware design, and we’re looking for people with real-world\nexperience in hardware design who are interested in using programming\nlanguage technology to improve the process of designing, testing and\nvalidating hardware designs.\n\nThis role involves working on the ground-up design and implementation\nof new FPGA applications, as well as helping extend and refine the\nhigh-level synthesis and testing tools that we use internally, based\non the HardCaml synthesis library\nfor OCaml.\n\nYou don’t need experience with OCaml in particular, or any experience\nin the financial markets, but we do want people who can approach\nhardware design with a software engineering mindset.  A good\nbackground with some typed functional language and experience with\nusing FPGAs in the context of Ethernet networking are both pluses.\n\nWe’re also especially (but not exclusively) interested in people who\nhave experience in other high-level synthesis tools like\nChisel,\nLava, and\nBluespec.\n\nYou can apply\nhere.\n",
        "url"      : "https://blog.janestreet.com/hiring-an-fpga-engineer/",
        "image"    : "https://blog.janestreet.com/hiring-an-fpga-engineer/fpga_hiring.jpg",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "What the interns have wrought, 2017 edition",
        "date"     : "August 14, 2017",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : ["internship"],
        "minsToRead" : 10,
        "content"  : "Intern season is coming to a close, and it’s a nice time to look back\n(as I’ve done in\nprevious\nyears) and review some of what\nthe interns did while they were here. The dev intern program has grown\nconsiderably, with almost 40 dev interns between our NY, London, and\nHong Kong offices.\n\nGiven that each intern does at least two projects in separate areas\nover the summer, there are a lot of projects to describe. And they\nreally run the gamut across Jane Street’s departments and codebases. A\nfew examples off the top of my head: building an incremental\npacket-capture database; creating tools for visualizing message rates\non exchange lines; adding tracing tools for Async; implementing an\nOCaml API to Nessus; working on algorithms for efficient scheduling in\na parallel/incremental system; monitoring the state of our networking\nswitches.\n\nRather than try in vain to survey the full project list, I’ve picked\nout a few to go into a bit more depth on.\n\nCaching Snapshots\n\nThis one has to do with the somewhat obscure world of marketdata, so a\nlittle background seems in order.\n\nMarketdata is roughly speaking the data feeds published by\nsecurities exchanges. These feeds provide you enough information to\ndetermine the full set of open orders that are resting on the exchange\nat any given point in time. This is typically done by providing a\ndetailed, transaction-by-transaction log of what happened on the\nexchange, including every order added, cancelled or traded.\n\nTo correctly interpret such a feed, you need to see messages in order\nand without gaps. This lets you rebuild the open order state\nincrementally, starting from a known state.\n\nThese feeds are generally distributed using IP multicast, which\nprovides a scalable way of distributing some of these rather big data\nfeeds to many consumers. Because IP multicast is unreliable, you need\ngap-fillers to replay messages that are lost, and snapshot servers\nto provide clients with a known starting state, so they can start in\nthe middle of the day without having to replay a full day’s worth of\nmessages.\n\nExchanges like NASDAQ and ARCA (two US equity exchanges) provide both\ngap-fillers and snapshot servers, but they are on the other end of a\nfinite-bandwidth pipe.  We have some big applications that consists of\nmany individual components that each need to subscribe to the same\nfeeds.  We have our own gap-filler infrastructure, but we don’t\ncurrently have our own snapshot servers.  That means that when these\nsystems are restarted, they all end up requesting snapshots at the\nsame time.  These snapshot storms can really strain the bandwidth on\nour exchange connections.\n\nTo deal with this, intern Maciej Debski worked on a snapshot\ncache. The goal of the cache is to stand between clients and\nexchange-side snapshot servers, and to keep requested snapshots around\nfor a short period of time, say, 30 seconds. This means that if we\nhave a storm of snapshot requests, we don’t have to forward any but\nthe first of those to the exchange-side snapshot server.\n\nThis work was all done in the context of Mu, our system for\nconsuming and normalizing multicast-based marketdata feeds. To make\nthis work, Maciej wrote the snapshot server, created a protocol for\nclients to request snapshot over our internal Async-RPC protocol, and\ndove into the Mu client code to add support for grabbing snapshots\nfrom the proxy instead of from the exchange-side server.\n\nI think it’s a nice reflection both of the Mu codebase and of Maciej’s\ngood work that a working draft of the project was completed in a few\nweeks! It’s not through review yet, but this is serious practical work\nthat we expect will materially improve the quality of our marketdata.\n\nTracking Traits\n\nBugs are an unavoidable part of programming. At Jane Street, we put a\nlot of effort into testing and type-level checks, but mistakes still\nslip through. Sometimes, these mistakes are bad enough that it’s not\nenough to simply fix the bug; we have to figure out which concrete\npieces of software are exposed to the bug, so we can do something\nabout it.\n\nJustin Cheng’s intern project was to build a system that would help us\nanswer the question of which pieces of software are exposed to a given\nbug. Most of our code is stored in a single large Mercurial repository\nwith about a million revisions. If a bug was found in one revision and\nresolved in another, we want to be able to figure whether a given\nrevision is a descendant of the revision where the bug was introduced,\nbut not a descendant of the revision where the bug was resolved. As\nyou can imagine, this is really a graph algorithm at heart.\n\nThe system Justin built is structured around the following concepts:\n\n\n  A gene corresponds to a fact which is seeded at a given\nrevision, and is inherited by all descendant nodes.\n  A genome is the set of all genes that apply to a given revision.\n\n\nA trait corresponds to a pair of genes, one that observes the\ntrait, and one that addresses it. A revision is considered to have\nthe trait if its genome has the observing but not the addressing gene.\n\nFrom a performance point of view, we wanted to be able to do all the\nwork associated with traits efficiently, in both space and time. After\nworking through a bunch of possible designs, Justin ended up\nimplementing a system that eagerly computed the genome of every\nrevision, making trait queries constant time and very fast. In order\nto keep memory usage down, genomes were hash-consed, meaning that each\nunique genome was represented exactly once, with multiple instances\nsimply using the same copy.\n\nThe key then, was to make the process of seeding genes efficient. In\nthe worst case, seeding a gene on the base revision of the entire\nrepository would require a walk over the entire million-node\ndata-structure, since it would affect the genome of every\nrevision. Even though adding and removing of seeds is relatively rare,\nwe wanted it to be fast so that it wouldn’t interfere with the\navailability of the Iron server,\nwithin which all of this trait tracking would be housed.\n\nPart of the solution is to take advantage of the fact that Mercurial\nalready presents the graph of revisions in a topologically sorted\norder, meaning that it’s easy to update the gene graph by simply\nwalking it in that order. In addition, by keeping the data structure\ntight and minimizing the amount of work that needs to be done per\nnode, even the worst-case walk could be made quite fast indeed. By the\nend, Justin got the worst-case walk down to 80ms, which is fast enough\nfor our purposes. And that’s for the worst case; the common case of\nplacing a seed on a fairly recent revision should take only a handful\nof milliseconds.\n\nThis project also nicely samples all the different kinds of work it\nrequires to build production software. In addition to doing the\nalgorithmic work, Justin was responsible for integration and testing:\nhooking traits into Iron, adding a UI for interacting with traits, and\nwriting functional tests.\n\nDeriving Diffs\n\nI’ve written before about\nsome of the ways in which diffs are an effective software engineering\ntool. One place it shows up a lot is in transferring data between\nsystems efficiently and incrementally.\n\nIn particular, consider an application that has a master process that\nmaintains a large and complex state, and a collection of worker\nprocesses that also maintain continuously updated copies of that\nstate.  Instead of transferring to the workers the entirety of the\nstate whenever the master is updated, we’d rather just send diffs. To\ndo that, we need types for representing diffs as well as code for\ncomputing diffs and applying patches.\n\nConsider the following trivial record type.\n\ntype t =\n  { a: int\n  ; b: float\n  ; c: string\n  }\n\n\n\nYou could write the following type to represent diffs to values of\ntype t.\n\ntype diff_t =\n  | A of int\n  | B of float\n  | C of string\n\n\n\nThis highlights that there’s a mechanical relationship between the\nshape of the type and the shape of the diff type. Rather than having\nto write this kind of mechanical code by hand, for his intern project,\nTomasz Syposz wrote a new syntax extension called ppx_diff, which\ncreates both the types and the code for working with them\nautomatically.\n\nMuch of the complexity of the project comes from thinking through all\nthe corner cases, and understanding how one should construct a diff in\neach case. That includes handling the various different type\nconstructors in OCaml, primarily records, variants and tuples. It also\nmeans thinking through how to deal with polymorphic types.\n\nIn addition, there are some design decisions that need to be left to\nthe user. In particular, the code generator isn’t well positioned to\nmake decisions about exactly how fine-grained the diffs should be. For\nexample, if you have a nested record, like the following:\n\ntype u =\n  { foo: int\n  ; bar: float\n  }\n\ntype t =\n  { baz: u\n  ; quuk: u\n  }\n\n\n\nShould the diff type go one level deep, like this:\n\ntype diff_t = Baz of u | Quuk of u\n\n\n\nOr two levels deep?\n\ntype diff_u = Foo of int | Bar of float\ntype diff_t = Baz of diff_u | Quuk of diff_u\n\n\n\nInstead of deciding entirely on its own, ppx_diff has room for user\nannotations to decide when the diff granularity should go down to the\nnext level of the data type.\n\nAll in, ppx_diff is a nice example of the power of syntactic\nabstractions. In this case, being able to operate at the level of\nsyntax allows us to get rid of a huge amount of drudgery in a way that\nsimply wouldn’t be possible in the host language on its own, while\nstill maintaining the guarantees of a strongly typed protocol.\n\nBecoming an intern\n\nThis is just a taste of the kinds of projects that software\ndevelopment interns work on each summer at Jane Street. If you find\nthese projects exciting, you should\napply!\n\nIf it sounds like these projects require background you don’t have,\ndon’t worry. You don’t have to know anything about functional\nprogramming or OCaml or the financial markets to join us for the\nsummer. We’re are looking for smart and effective programmers and we\nare happy to teach them what they need to know.\n",
        "url"      : "https://blog.janestreet.com/what-the-interns-have-wrought-2017/",
        "image"    : "https://blog.janestreet.com/what-the-interns-have-wrought-2017/what_interns_wrought.png",
        "topic"    :  ["technology","internship"] ,
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "When Bash Scripts Bite",
        "date"     : "May 11, 2017",
        "authorId" : "tlubin",
        "author"   : "Todd Lubin",
        "tags"     : [],
        "minsToRead" : 3,
        "content"  : "There are abundant resources online trying to scare programmers away from using\nshell scripts. Most of them, if anything, succeed in convincing the reader to\nblindly put something that resembles\n\nset -euo pipefail\n\n\n\nat the top of their scripts. Let’s focus on the “-e” flag. What does this do?\nWell, here are descriptions of this flag from the first two results on Google\nfor “writing safe bash scripts”:\n\n\n  “If a command fails, set -e will make the whole script exit, instead of\njust resuming on the next line” (https://sipb.mit.edu/doc/safe-shell/)\n  “This tells bash that it should exit the script if any statement returns a\nnon-true return value.”\n(http://www.davidpashley.com/articles/writing-robust-shell-scripts/)\n\n\nUnfortunately, this is bash we are talking about and the story is never that\nsimple.\n\nA couple months ago, a particular production bash script (if that doesn’t sound\nhorrifying, hopefully it will by the end of this post) failed in the worst kind\nof way: silently. The script generates a list of valid users at Jane Street and\npushes this out to our inbound mail servers. It looks something like:\n\nset -euo pipefail\n...\necho \"($(ldap-query-for-valid-users))\" &gt; \"/tmp/all-users.sexp\"\n...\npush-all-users-if-different\n\n\n\nOn this one particular day, a file was deployed with the contents “()”. But why\ndidn’t set -e cause the script to exit when ldap-query-for-valid-users\nfailed? A quick look at the bash man page answers this question. It turns out\nthat there are a couple surprising subtleties to this flag. Here are a couple:\n\nset -e works on “simple commands”\n\nA script will exit early if the exit status of a simple command is nonzero. So\nhow is a simple command executed? In short, bash does all expansions and checks\nto see if there is still a command to run. If there is a command to run, the\nexit status of the simple command is the exit status of the command. If there is\nnot a command to run, the exit status of the simple command is the exit status\nof the last command substitution performed. Here are some example commands that\nall have exit status 0, so would not cause a set -e script to exit:\n\n# echo, local and export are commands that always have exit status 0\necho \"$(/bin/false)\"\nlocal foo=\"$(/bin/false)\"\nexport foo=\"$(/bin/false)\"\n\n# the last command substitution has exit status 0\nfoo=\"$(/bin/false)$(/bin/true)\"\n\n\n\nset -e does not get passed to subshells in command substitution (without --posix)\n\nHere is an example consequence of this:\n\nset -e\n\nfoo() {\n    /bin/false\n    echo \"foo\"\n}\necho \"$(foo)\"\n\n\n\nRunning this script with bash will print “foo” while running this with\nbash --posix (or sh) will not. Both scripts will exit with status 0.\n\nTangible takeaway\n\nThis is not to say that something like set -euo pipefail should not be used at\nthe top of all bash scripts, but it should not give you a false sense of\nsecurity. Like all production code, you must reason about all failure conditions\nand ensure they are handled appropriately. Even if you are some kind of bash\nexpert who knows all these subtleties, chances are your peers do not. The\nexecution of shell scripts is subtle and confusing, and for production code,\nthere is likely a better tool for the job.\n",
        "url"      : "https://blog.janestreet.com/when-bash-scripts-bite/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Looking for a technical writer",
        "date"     : "May 1, 2017",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 1,
        "content"  : "Update: I’m excited to say that we’ve now hired a (great!) technical\nwriter, so the position is closed.\n\nJane Street is looking to hire a technical writer.\n\nWe’ve always believed that developers should spend time and effort documenting\ntheir own code, but at the same time, a great writer with a feel for the\ntechnology can raise the level of quality in a way that few developers can. And\nas we’ve grown, having someone dedicated to writing makes a ton of sense.\n\nHere are the kinds of things we’d like to have a technical writer work on:\n\n\n  Training material. We have a training program that many new hires go\nthrough, including most new developers and all new traders. In that program,\nthey learn about OCaml, our base libraries, our build system, the UNIX\nshell, Emacs, and our dev tools. Part of the job would be to help make the\ncourse better, both by improving what we have, and by adding new material.\n  Improving library documentation. While we expect developers to do a\nreasonable job of documenting their code, our most important libraries\ndeserve the time and care to make them really shine. This is aimed both\ninternally and externally, since a lot of these libraries, like Async, Core\nand Incremental, are open source.\n  Writing longer pieces. We need more tutorials and overviews on a variety\nof topics. Part of the work would be to create great new documentation, and\npart of it is to serve as an example for others as to what good\ndocumentation looks like. And where possible, we want to do this so that the\ndocumentation effectively compiles against our current APIs, preventing it\nfrom just drifting out of date.\n\n\nIn terms of skills, we want someone who is both a clear and effective written\ncommunicator, and who is good enough at programming to navigate our codebase,\nwork through our tutorials, and write up examples. An interest in functional\nprogramming and expressive type systems is a plus, but you don’t need to know\nany OCaml (the language we use). That’s something we’re happy to teach you here.\n",
        "url"      : "https://blog.janestreet.com/looking-for-a-technical-writer/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Caveat Configurator: how to replace configs with code, and why you might not want to",
        "date"     : "April 25, 2017",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 1,
        "content"  : "We have a new tech talk coming up on\nMay 17th, from our very own Dominick LoBraico. This one is about how to\nrepresent configurations with programs. In some sense, this is an obvious idea.\nLots of programmers have experienced the dysphoria that comes from watching your\nelegant little configuration format metamorphize into a badly constructed\nprogramming language with miserable tools. This happens because, as you try to\nmake your configs clearer and more concise, you often end up walking down the\nprimrose path of making your config format ever more language-like. But you\nnever really have the time to make it into a proper language.\n\nThe obvious alternative is to just use a real language, one that comes with\ndecent tooling and well-designed abstractions. (And ideally, a functional\nlanguage, because they tend to be better at writing clear, declarative code.)\n\nThis talk discusses what happens when you try to put this obvious idea into\npractice, which is less, well, obvious. I like this kind of topic because it\nrepresents the kind of hard-won knowledge you can only get by trying something\nand screwing it up a few times.\n\nSo please join us! You can register\nhere.\n",
        "url"      : "https://blog.janestreet.com/caveat-configurator-how-to-replace-configs-with-code-and-why-you-might-not-want-to/",
        "image"    : "https://blog.janestreet.com/caveat-configurator-how-to-replace-configs-with-code-and-why-you-might-not-want-to/dominick-talk.jpg",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "This is not the performance you were looking for: the tricks systems play on us",
        "date"     : "April 20, 2017",
        "authorId" : "amyers",
        "author"   : "Andy Myers",
        "tags"     : [],
        "minsToRead" : 5,
        "content"  : "It’s often surprising just how much software performance depends on how the\nsoftware is deployed. All the time and effort you’ve invested in optimization\ncan be erased by a few bad decisions in scheduler policy, affinity, or\nbackground workload on a server.\n\nSo here are a few things I check for when an app’s performance is unexpectedly\nbad. These are things that should apply to any OS running on a modern server,\nbut the specific tools I’ll mention are for Linux.\n\nAre you running the binary you think you are?\n\nIt’s funny how often seemingly bizarre problems have simple explanations. So I\nstart by checking that I’m really running the right binary. I use md5sum to\nget a hash:\n\nmd5sum /path/to/your/executable\n\n\n\nThen I verify that the hash matches the md5sum of the app I was trying to\ndeploy. On Linux, you do the same trick to check a running binary via the proc\nfilesystem if you know the process’s PID:\n\nmd5sum /proc/$PID/exe\n\n\n\nAre the dynamic libraries in use the same?\n\nSometimes the same app will perform unexpectedly because the dynamic libraries\nare not what you’d expect. Ldd can tell you which libraries will be linked at\nstartup time:\n\nldd /path/to/your/executable\n# or\nldd /proc/$PID/exe\n\n\n\nDid you affinitize the process?\n\nWith Linux, you can restrict the cores that a process runs on. That can be a\nbenefit because it helps keep the process’s data warm in the processor’s cache.\nFor a single-threaded app, affinitizing to a single core might be the right\nchoice, but a busy multi-threaded app may require multiple cores.\n\nAnd you can see which cores a process is able to run on via taskset -p $PID.\nTaskset can also be used to control which cores a process runs on.\n\nDon’t forget about NUMA effects\n\nModern servers use\nNUMA, which means\nthat latency and throughput to RAM, disk or the network depends on which core an\napplication is running on. Though the penalty is small for each operation (in\nthe range of hundreds of nanoseconds), when aggregated across an application the\naffect can be noticable.\n\nKeep each application close to the things it uses. If an application uses the\nnetwork, then affinitize the application to a core that’s on the same NUMA node\nas the network adapter that it’s using.\n\nOn Linux, you can the topology of your hardware using numactl -H. Here’s\nsample output:\n\navailable: 2 nodes (0-1)\nnode 0 cpus: 0 2 4 6 8 10 12 14\nnode 0 size: 65442 MB\nnode 0 free: 63882 MB\nnode 1 cpus: 1 3 5 7 9 11 13 15\nnode 1 size: 65536 MB\nnode 1 free: 63515 MB\nnode distances:\nnode   0   1\n  0:  10  21\n  1:  21  10\n\n\n\nThe output tells you that there are 2 nodes, each with 64 GB of RAM and 8 cores.\n\nWhat about other processes?\n\nJust because you affinitized your app to a specific core doesn’t mean that other\napps won’t also use that core. So once you start affinitizing one app, you’ll\nwant to affinitize the other apps on the server as well.\n\nFor a while, the Linux kernel has a command line option to reserve cores from\nboot time: isolcpus. For instance, booting Linux with the kernel parameter\nisolcpus=1,3-5 tells the kernel that by default, no process should be\nscheduled cores 1, 3, 4 and 5. However, we as well as\nothers\nhave found that isolcpus can lead to unintended behavior where load is\nconcentrated rather than spread across cores, so we don’t use it.\n\nAffinity and other hardware\n\nIf an app uses a lot of peripherals (e.g. network or storage), make sure the app\nis affinitized to the same NUMA node as the peripheral.\n\nTo check the NUMA node of an ethernet device, you can use sysfs:\n\ncat /sys/class/net/$ETH/device/numa_node\n\n\n\nThe Linux tool hwloc-ls will also tell you how system components map to NUMA\nnodes.\n\nMachine setup\n\nSometimes the problem isn’t with how the software is deployed but the\nperformance difference comes from the machine itself: either its hardware or\nsoftware setup is not quite what you’d expect.\n\nPerformance on a virtual machine is often quite a bit worse than on a physical\nmachine. You can check if a machine is virtual by looking for the hypervisor\nflag in /proc/cpuinfo:\n\ngrep -q '^flags.* hypervisor.*' /proc/cpuinfo && echo this is a VM\n\n\n\nIs this the software you expect?\n\nFor starters, you can figure out the version of the Linux kernel on a machine\nwith uname -a. Different kernels can behave very differently on the same\nworkload.\n\nYou can also use your OS package manager to list all the packages and versions.\nOften I’ll run the same command on two servers and diff the output:\n\nfunction hdiff () { diff -u &lt;(ssh $1 $3) &lt;(ssh $2 $3) }\n\n\n\nYou can use this to diff the software installed on two hosts:\n\nhdiff $host1 $host2 \"dkpg -l\"\n\n\n\nIs the hardware what you expect?\n\nAs a first step, check the processor model and speed via cat /proc/cpuinfo.\n\nDMI can tell you\nmany things about the hardware you’re dealing with:\n\nhdiff $host1 $host2 \"sudo /usr/sbin/dmidecode\"\n\n\n\nThe output of dmidecode is huge and very detailed. One thing to pay particular\nattention to is the version of the BIOS:\n\nBIOS Information Vendor: Computers Inc. Version: 1.5.1 Release Date: 06/23/2012\n\n\n\nFinally, when dealing with the unexpected, it never hurts to check whether the\nserver you’re running on has been rebooted recently enough:\n\n$ uptime 13:16:41 up 300 days, 9:21, 1 user, load average: 0.00, 0.00, 0.00\n\n\n\nThree hundred days is way too long.\n\nSummary\n\nAdvances in server architecture have led to spectacular performance gains, but\noften the gains are only realized when apps are tuned properly. This post only\nscratches the surface of the issues in performance tuning. Still, I’ve found\nthese tools useful and I hope you will too.\n\nBy the way, if you enjoy solving these sorts of problems, Jane Street is\nhiring!\n",
        "url"      : "https://blog.janestreet.com/this-is-not-the-performance-you-were-looking-for-the-tricks-systems-play-on-us/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Trivial meta-programming with cinaps",
        "date"     : "March 20, 2017",
        "authorId" : "jdimino",
        "author"   : "Jeremie Dimino",
        "tags"     : [],
        "minsToRead" : 17,
        "content"  : "From now and then, I found myself having to write some mechanical and repetitive\ncode. The usual solution for this is to write a code generator; for instance in\nthe form of a ppx rewriter in the case of OCaml code. This however comes with a\ncost: code generators are harder to review than plain code and it is a new\nsyntax to learn for other developers. So when the repetitive pattern is local to\na specific library or not widely used, it is often not worth the effort.\nEspecially if the code in question is meant to be reviewed and maintained by\nseveral people.\n\nThen there is the possibility of using a macro pre-processor such as cpp or cppo\nwhich is the equivalent of cpp but for OCaml. This can help in some cases but\nthis has a cost as well:\n\n\n  macros generally make the code harder to read\n  errors tends to be harder to understand since they don’t point where you’d\nexpect\n  you can say goodbye to merlin\n\n\nIn fact, when the repetitive pattern is specific to one particular case and of\nreasonable size, committing and reviewing the generated code is acceptable.\nThat’s the problem Cinaps tries to\nsolve.\n\nWhat is cinaps?\n\nCinaps is an application that reads input files and recognize special syntactic\nforms. Such forms are expected to embed some OCaml code printing something to\nstdout. What they print is compared against what follow these special forms. The\nrest works exactly the same as expectation tests.\n\nThe special form is (*$ &lt;ocaml-code&gt; *) for ml source files,\n/*$ &lt;ocaml-code&gt; */ for C source files and #|$ &lt;ocaml-code&gt; |# for\nS-expression files.\n\nFor instance:\n\n$ cat file.ml\nlet x = 1\n(*$ print_newline ();\n    List.iter (fun s -&gt; Printf.printf \"let ( %s ) = Pervasives.( %s )\\n\" s s)\n      [\"+\"; \"-\"; \"*\"; \"/\"] *)\n(*$*)\nlet y = 2\n\n$ cinaps file.ml\n---file.ml\n+++file.ml.corrected\nFile \"file.ml\", line 5, characters 0-1:\n  let x = 1\n  (*$ print_newline ();\n      List.iter (fun s -&gt; Printf.printf \"let ( %s ) = Pervasives.( %s )\\n\" s s)\n        [\"+\"; \"-\"; \"*\"; \"/\"] *)\n+|let ( + ) = Pervasives.( + )\n+|let ( - ) = Pervasives.( - )\n+|let ( * ) = Pervasives.( * )\n+|let ( / ) = Pervasives.( / )\n  (*$*)\n  let y = 2\n\n$ echo $?\n1\n$ cp file.ml.corrected file.ml\n$ cinaps file.ml\n$ echo $?\n0\n\n\n\nReal example\n\nWhat follows is a real example where using Cinaps made the code much easier to\nwrite and maintain. However, I changed the names for this blog post since this\ncode is not released publicly. Note also that this example shows one way we\nusually write C bindings at Jane Street. It is not meant as a model of how to\nwrite C bindings, and the excellent\nctypes library should be the\ndefault choice in most cases. However, this code pre-dates ctypes and migrating\nit would be quite a lot of work.\n\nThe example itself is part of a C binding that I wrote a few years ago. While\ndoing so I used Core.Flags in order to represent a few C enumerations on the\nOCaml side. Core.Flags is a module providing a nice abstraction for\nrepresenting C\nflags.\n\nThe OCaml code looks like what you’d expect from code using Core.Flags:\n\nmodule Open_flags = struct\n  external get_rdonly   : unit -&gt; Int63.t = \"mylib_O_RDONLY\"   [@@noalloc]\n  external get_wronly   : unit -&gt; Int63.t = \"mylib_O_WRONLY\"   [@@noalloc]\n  external get_rdwr     : unit -&gt; Int63.t = \"mylib_O_RDWR\"     [@@noalloc]\n  external get_nonblock : unit -&gt; Int63.t = \"mylib_O_NONBLOCK\" [@@noalloc]\n  external get_append   : unit -&gt; Int63.t = \"mylib_O_APPEND\"   [@@noalloc]\n  external get_creat    : unit -&gt; Int63.t = \"mylib_O_CREAT\"    [@@noalloc]\n  external get_trunc    : unit -&gt; Int63.t = \"mylib_O_TRUNC\"    [@@noalloc]\n  external get_excl     : unit -&gt; Int63.t = \"mylib_O_EXCL\"     [@@noalloc]\n  external get_noctty   : unit -&gt; Int63.t = \"mylib_O_NOCTTY\"   [@@noalloc]\n  external get_dsync    : unit -&gt; Int63.t = \"mylib_O_DSYNC\"    [@@noalloc]\n  external get_sync     : unit -&gt; Int63.t = \"mylib_O_SYNC\"     [@@noalloc]\n  external get_rsync    : unit -&gt; Int63.t = \"mylib_O_RSYNC\"    [@@noalloc]\n\n  let rdonly   = get_rdonly   ()\n  let wronly   = get_wronly   ()\n  let rdwr     = get_rdwr     ()\n  let nonblock = get_nonblock ()\n  let append   = get_append   ()\n  let creat    = get_creat    ()\n  let trunc    = get_trunc    ()\n  let excl     = get_excl     ()\n  let noctty   = get_noctty   ()\n  let dsync    = get_dsync    ()\n  let sync     = get_sync     ()\n  let rsync    = get_rsync    ()\n\n  include Flags.Make(struct\n      let known =\n        [ rdonly   , \"rdonly\"\n        ; wronly   , \"wronly\"\n        ; rdwr     , \"rdwr\"\n        ; nonblock , \"nonblock\"\n        ; append   , \"append\"\n        ; creat    , \"creat\"\n        ; trunc    , \"trunc\"\n        ; excl     , \"excl\"\n        ; noctty   , \"noctty\"\n        ; dsync    , \"dsync\"\n        ; sync     , \"sync\"\n        ; rsync    , \"rsync\"\n        ]\n      let remove_zero_flags = false\n      let allow_intersecting = false\n      let should_print_error = true\n    end)\nend\n\n\n\nAnd there are about 3 modules like this in this file, plus the corresponding\nstubs in the C file. Writing this code initially was no fun, and adding new\nflags now that the C library has evolved is still no fun.\n\nThe rest of this section explains how to make it more fun with cinaps.\n\nSetting up and using cinaps\n\nFirst I add a rule in the build system to call cinaps appropriately. I use a\nfew settings specific to our jenga based builds and it is currently not possible\nto replicate this outside of Jane Street, but assuming you have a Makefile,\nyou can write:\n\n.PHONY: cinaps\ncinaps:\n    cinaps -i src/*.ml src/*.c\n\n\n\nNow whenever you call make cinaps, all the files will be updated in place. You\ncan then do git diff to see what changed.\n\nThen I write a file src/cinaps_helpers. It is plain OCaml source file, however\nit is not suffixed with .ml so that it is not confused with a regular module of\nthe library. It contains the various bits that are common between the ml/C files\nin the library:\n\n(* -*- tuareg -*- *)\n\nlet stub_prefix = \"mylib_\"\nlet stub name = stub_prefix ^ name\n\nlet open_flags =\n  [ \"O_RDONLY\"\n  ; \"O_WRONLY\"\n  ; \"O_RDWR\"\n  ; \"O_NONBLOCK\"\n  ; \"O_APPEND\"\n  ; \"O_CREAT\"\n  ; \"O_TRUNC\"\n  ; \"O_EXCL\"\n  ; \"O_NOCTTY\"\n  ; \"O_DSYNC\"\n  ; \"O_SYNC\"\n  ; \"O_RSYNC\"\n  ]\n\nlet other_flags =\n  [ ...\n  ]\n\n\nlet yet_other_flags =\n  [ ...\n  ]\n\nlet all_flags = open_flags @ other_flags @ yet_other_flags\n\nopen StdLabels\nopen Printf\nlet pr fmt = printf (fmt ^^ \"\\n\")\n\nlet flags_module module_name flags ~prefix ~allow_intersection =\n  &lt;code to print an Open_flags like module&gt;\n\n\n\nNow, in my original .ml file, I can write:\n\n(*$ #use \"cinaps_helpers\" $*)\n\n(*$ flags_module \"Open_flags\" open_flags ~prefix:\"O_\" ~allow_intersecting:false *)\nmodule Open_flags = struct\n  external get_rdonly   : unit -&gt; Int63.t = \"mylib_O_RDONLY\"   [@@noalloc]\n  external get_wronly   : unit -&gt; Int63.t = \"mylib_O_WRONLY\"   [@@noalloc]\n  external get_rdwr     : unit -&gt; Int63.t = \"mylib_O_RDWR\"     [@@noalloc]\n  external get_nonblock : unit -&gt; Int63.t = \"mylib_O_NONBLOCK\" [@@noalloc]\n  external get_append   : unit -&gt; Int63.t = \"mylib_O_APPEND\"   [@@noalloc]\n  external get_creat    : unit -&gt; Int63.t = \"mylib_O_CREAT\"    [@@noalloc]\n  external get_trunc    : unit -&gt; Int63.t = \"mylib_O_TRUNC\"    [@@noalloc]\n  external get_excl     : unit -&gt; Int63.t = \"mylib_O_EXCL\"     [@@noalloc]\n  external get_noctty   : unit -&gt; Int63.t = \"mylib_O_NOCTTY\"   [@@noalloc]\n  external get_dsync    : unit -&gt; Int63.t = \"mylib_O_DSYNC\"    [@@noalloc]\n  external get_sync     : unit -&gt; Int63.t = \"mylib_O_SYNC\"     [@@noalloc]\n  external get_rsync    : unit -&gt; Int63.t = \"mylib_O_RSYNC\"    [@@noalloc]\n\n  let rdonly   = get_rdonly   ()\n  let wronly   = get_wronly   ()\n  let rdwr     = get_rdwr     ()\n  let nonblock = get_nonblock ()\n  let append   = get_append   ()\n  let creat    = get_creat    ()\n  let trunc    = get_trunc    ()\n  let excl     = get_excl     ()\n  let noctty   = get_noctty   ()\n  let dsync    = get_dsync    ()\n  let sync     = get_sync     ()\n  let rsync    = get_rsync    ()\n\n  include Flags.Make(struct\n      let known =\n        [ rdonly   , \"rdonly\"\n        ; wronly   , \"wronly\"\n        ; rdwr     , \"rdwr\"\n        ; nonblock , \"nonblock\"\n        ; append   , \"append\"\n        ; creat    , \"creat\"\n        ; trunc    , \"trunc\"\n        ; excl     , \"excl\"\n        ; noctty   , \"noctty\"\n        ; dsync    , \"dsync\"\n        ; sync     , \"sync\"\n        ; rsync    , \"rsync\"\n        ]\n      let remove_zero_flags = false\n      let allow_intersecting = false\n      let should_print_error = true\n    end)\nend\n(*$*)\n\n\n\nAnd cinaps will check that the text between the (*$ ... *) and (*$*) forms\nis what is printed by flags_module \"Open_flags\" .... I write something similar\nin the .c file. Note the initial (*$ ... $*) form, which is not expected to\nprint anything and is only used for its other side effects.\n\nAdding new flags become trivial: add it to the list in src/cinaps_helper and\nexecute make cinaps.\n\nPushing the system\n\nNow I decide that I don’t like the fact that all my constant flags are\ninitialized at runtime and I want them to be static constant on the ml side. A\nsimple way to do this is to write a C program that include the right headers and\noutput a .ml file defining these constants. I use cynaps to write this C file as\nwell:\n\n/*$ #use \"cinaps_helpers\" $*/\n\n#include &lt;stdio.h&gt;\n\n#include &lt;sys/types.h&gt;\n#include &lt;sys/stat.h&gt;\n#include &lt;fcntl.h&gt;\n\nint main()\n{\n  printf(\"open Core\\n\");\n  printf(\"let mk = Int63.of_int_exn\\n\");\n  /*$\n    printf \"\\n\";\n    let len = longest all_flags in\n    List.iter all_flags ~f:(fun f -&gt;\n      pr {|  printf(\"let _%-*s = mk %%d\\n\", %-*s);|} len f len f );\n    printf \"  \" */\n  printf(\"let _O_RDONLY   = mk %d\\n\", O_RDONLY  );\n  printf(\"let _O_WRONLY   = mk %d\\n\", O_WRONLY  );\n  printf(\"let _O_RDWR     = mk %d\\n\", O_RDWR    );\n  printf(\"let _O_NONBLOCK = mk %d\\n\", O_NONBLOCK);\n  printf(\"let _O_APPEND   = mk %d\\n\", O_APPEND  );\n  printf(\"let _O_CREAT    = mk %d\\n\", O_CREAT   );\n  printf(\"let _O_TRUNC    = mk %d\\n\", O_TRUNC   );\n  printf(\"let _O_EXCL     = mk %d\\n\", O_EXCL    );\n  printf(\"let _O_NOCTTY   = mk %d\\n\", O_NOCTTY  );\n  printf(\"let _O_DSYNC    = mk %d\\n\", O_DSYNC   );\n  printf(\"let _O_SYNC     = mk %d\\n\", O_SYNC    );\n  printf(\"let _O_RSYNC    = mk %d\\n\", O_RSYNC   );\n  /*$*/\n  return 0;\n}\n\n\n\nUpdating the various flag modules in the the ml code is as simple as editing\nsrc/cinaps_helpers and doing make cinaps:\n\n(*$ flags_module \"Open_flags\" open_flags ~prefix:\"O_\" ~allow_intersecting:false *)\nmodule Open_flags = struct\n  let rdonly   = Consts._O_RDONLY\n  let wronly   = Consts._O_WRONLY\n  let rdwr     = Consts._O_RDWR\n  let nonblock = Consts._O_NONBLOCK\n  let append   = Consts._O_APPEND\n  let creat    = Consts._O_CREAT\n  let trunc    = Consts._O_TRUNC\n  let excl     = Consts._O_EXCL\n  let noctty   = Consts._O_NOCTTY\n  let dsync    = Consts._O_DSYNC\n  let sync     = Consts._O_SYNC\n  let rsync    = Consts._O_RSYNC\n\n  include Flags.Make(struct\n      let known =\n        [ Consts._O_RDONLY   , \"rdonly\"\n        ; Consts._O_WRONLY   , \"wronly\"\n        ; Consts._O_RDWR     , \"rdwr\"\n        ; Consts._O_NONBLOCK , \"nonblock\"\n        ; Consts._O_APPEND   , \"append\"\n        ; Consts._O_CREAT    , \"creat\"\n        ; Consts._O_TRUNC    , \"trunc\"\n        ; Consts._O_EXCL     , \"excl\"\n        ; Consts._O_NOCTTY   , \"noctty\"\n        ; Consts._O_DSYNC    , \"dsync\"\n        ; Consts._O_SYNC     , \"sync\"\n        ; Consts._O_RSYNC    , \"rsync\"\n        ]\n      let remove_zero_flags = false\n      let allow_intersecting = false\n      let should_print_error = true\n    end)\nend\n(*$*)\n\n\n\nTweak: indenting the generated code\n\nYou can either write cinaps code that produce properly indented code, or you can\nuse the styler option:\n\n.PHONY: cinaps\ncinaps:\n    cinaps -styler ocp-indent -i src/*.ml src/*.c\n\n\n\nHistory behind the name\n\nI initially wrote this tool while I did some work on the\nocaml-migrate-parsetree\nproject. ocaml-migrate-parsetree was started by Alain Frisch and continued by\nFrederic Bour and aims at providing a solid and stable base for authors of ppx\nrewriters or other tools using the OCaml frontend. I helped a bit during\ndevelopment and did some testing on a large scale while rebasing our ppx\ninfrastructure on top it.\n\nDue to its nature, this project contains a lot of repetitive code that cannot be\nfactorized other than by using some kind of meta-programming. Initially we had a\nsmall pre-preprocessor that was interpreting a made-up syntax and was working\nlike cpp does. The syntax was yet another DSL and the generated code was\ngenerated on the fly. This made the .ml and .mli files harder to understand\nsince you had to decode this DSL in order to understand what the code was.\n\nCinaps replaced this tool and the name was chosen to emphasize that it is not a\npreprocessor. It means “Cinaps Is Not A Preprocessing System”.\n\nStatus\n\nCinaps is published on github and is part of the upcoming v0.9 Jane Street\nrelease. The version that is published doesn’t yet support the C/S-expression\nsyntaxes but once the stable release has gone through, an updated version of\nCinaps supporting these syntaxes will be released.\n",
        "url"      : "https://blog.janestreet.com/trivial-meta-programming-with-cinaps/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "One more talk, two more videos",
        "date"     : "March 15, 2017",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 1,
        "content"  : "I’m happy to announce our next public tech\ntalk, called Seven\nImplementations of Incremental, on Wednesday, April 5th, presented by yours\ntruly. You can register\nhere.\n\nThe talk covers the history of\nIncremental, a library for building\nefficient online algorithms. The need to update computations incrementally is\npretty common, and we’ve found Incremental to be useful in creating such\ncomputations in a number of different domains, from constructing efficient\nfinancial calculations to writing responsive, data-rich web UIs.\n\nThe ideas behind Incremental aren’t new with us; there is a lot of prior art,\nmost notably the work from Umut Acar’s\nwork on self-adjusting\ncomputations, on which Incremental is most directly modeled.\n\nBut there’s a big gap between the academic version of an idea and a production\nready instantiation, and this talk is about crossing that gap. It discusses the\n7 different implementations we went through and the various mistakes we made\nalong the way towards the current one we use in production.\n\nSo join us. I hope you enjoy seeing what we learned about building this kind of\nsystem, as well as hearing about the hilarious pratfalls along the way.\n\n\n\nOn another note, we have finally posted videos from our two previous talks,\nincluding Brian Nigito’s talk on the architecture of the modern\nexchange, and Arjun Guha’s\ntalk on taming Puppet. And, of\ncourse, you can subscribe to our\nchannel while you’re\nthere.\n",
        "url"      : "https://blog.janestreet.com/one-more-talk-two-more-videos/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "What a Jane Street software engineering interview is like",
        "date"     : "February 28, 2017",
        "authorId" : "sfunk",
        "author"   : "Sebastian Funk",
        "tags"     : ["interviewing"],
        "minsToRead" : 9,
        "content"  : "Are you thinking about\napplying to Jane Street\nfor a software engineering role? Or already have a phone interview scheduled but unsure\nwhat to expect? Read on as we walk through an example phone interview with you.\n\nWe want to give you some insight into what a typical Jane Street phone interview\nlooks like and give you a chance to prepare. We’re going to take a look at a\nquestion we call “Memo” which we used to ask regularly (but of course don’t\nask anymore so no need to memorize anything on this page!). As such this post is\nmeant to be a specific case analysis. If you haven’t yet seen it, we recommend\nreading this blog post for a general overview\nwhat we are looking for in candidates.\n\nGetting started\n\nTo allow us to work on the question together, we’ll use a shared online editor.\nWe’ll provide you with the link to use (either before hand or during the\ninterview).\n\nWe expect you to write code in a real programming language in the interview, not\npseudo-code. You can use any programming language you’d like, but we strongly\nrecommend you use the one you’re most familiar with and will let you solve the\nproblem in the best way (it’s fine to change your mind as you’re exploring the\nproblem!). You should be comfortable with basic data structures and APIs in the\nlanguage of your choice.\n\nNote, there are no bonus points for using a functional language like OCaml.\nPlease don’t use OCaml just because you think it will make us happy. We want to\nsee you at your best, so you should use the language you’re most comfortable\nwith. If OCaml isn’t your strongest language, then don’t use it.\n\nIf during the interview you realize you have seen a question before, then please\nlet us know and we can do a different question. As you’ll see in this post,\nknowing a question in advance greatly reduces what we can learn about you!\n\nPart 1 – Basic coding\n\nHave you heard about memoization? Can you carefully describe what it is? If you\nhaven’t heard about it, don’t worry. We’ll bring you up to speed. (A good\nintroduction is on Wikipedia.)\n\nNow let’s say there is a function f of type int -&gt; int whose output only\ndepends on the input. f is very expensive to compute. We’d like you to write a\nmemoized version of this function, i.e. another function g of the same type,\nthat returns the same values – g(x) = f(x) for all x – but only does the\nexpensive computation once for each input value.\n\nA typical first solution we’re looking for at this stage uses a hash-table to\nstore calculated results. A possible solution in OCaml might be:\n\nlet memo f =\n  let results = Hashtbl.create 256 in\n  (fun input -&gt;\n     match Hashtbl.find results input with\n     | None -&gt;\n        let result = f input in\n        Hashtbl.add results ~key:input ~data:result;\n        result\n     | Some result -&gt; result)\n\n\n\n(As I said above, you’re not required or expected to write in OCaml but in this\nblog post I’m going to follow my own advice and use the language I’m most\nfamiliar with, which is OCaml. You might also object that this does a\ntest-and-set without a lock, so can’t possibly be thread-safe. Nice spot! For\nthe purpose of this question let’s ignore re-entry to focus on the core ideas.)\n\nWhichever APIs or data structures you end up using for your solution: you should\nbe prepared to talk about how they work and what the complexity of various\noperations is.\n\nPart 2 – Reasoning about and improving your code\n\nCan you think of any issues we could run into when using the function from part\n1? For example, let’s say we run your function in production and notice our\napplication performs significantly worse than before. Quite the opposite from\nwhat we hoped memoization would do! Can you see what the problem might be?\n\nThe big problem is memory usage. Our application might call f with lots of\ndifferent inputs and each result will be stored in the hashtable forever – a\nmemory leak! Can you come up with some ideas to improve upon this?\n\nA reasonable approach to control the memory usage is to bound the size of the\nhash-table and to evict things from it in a FIFO fashion. What trade-offs does\nFIFO have versus other eviction schemes? How could you modify your memo function\nto implement FIFO? Let’s aim for O(1) complexity when evicting from the cache.\n\nThere are a few different ways to do this. A good solution keeps a separate\nqueue: when adding a new result to your hashtable, if the size grows beyond the\nbound, then dequeue from the queue and remove that element from the hashtable.\n\nBesides being able to write code to do this, we look for how you communicate\nyour thoughts on the problem and ideas to improve it. We don’t necessarily\nexpect every candidate to immediately jump to the O(1) solution, but we’re\ninterested in the process of talking through this problem and what you can come\nup with.\n\nPart 3 – Going further\n\nAs you probably realize, FIFO can be very inefficient in some use-cases. Let’s\nsay we want to do LRU (least recently used) instead. We still want the\nimplementation to be as efficient as possible. How can you implement this?\n\nOnce more, there are multiple ways to do this. The easiest solution stores\ntimestamps with each value in the hashtable and linearly scans through the table\nwhen evicting. This is O(n). It’s possible to improve to O(log n) using a\nmin-heap or even to O(1) using a doubly-linked list.\n\n\n  Side-note: implementing the most efficient solution One way to get to\nO(1) is to maintain a doubly linked list and make the values in the hash-table\npoint to elements in that list. Then when looking up a cached value in the\nhash-table, you can slice it out of its current position in the list and put\nit at the top in O(1). You maintain a separate pointer to the bottom of the\ndoubly linked list to evict in O(1).\n\n\nFew candidates come up with the doubly-linked list immediately, so don’t worry\nif this is not something you thought of straight away. While we might ask you to\nimplement parts of your solution, this part is intended as a discussion and test\nyour ideas for improving complexity. We’ll guide you to the right level of\nabstraction, so you don’t have to worry too much about which details to include.\n\nWe also have various other extensions for this question that make it possible to\nkeep going further as far as time allows.\n\nWhat we look for\n\nAgain, for a good overview see this post.\n\nWhile interviewing, we try to evaluate how well you would fit in our work\nenvironment by collaboratively solving a problem. This means the journey through\nthe interview is a lot more important than the snapshot of the solution at the\nend of it. We are more interested in seeing how you approach a difficult problem\nthan just capturing a boolean flag whether you can come up with the solution.\n\nAs a concrete example, we might prefer a candidate that wrote careful and clear\ncode, communicated well and had good insights and ideas along the way, but\ndidn’t get as far through the problem compared to another candidate that\nimmediately solved every part but was sloppy and hard to follow.\n\nThis makes it impossible to give a rigid set of requirements of what needs to be\nachieved during the interview. Nevertheless, to give you some rough guidelines:\n\nEvery successful candidate should achieve a bug-free solution for part 1\nrelatively quickly. If some of part 1 sounds unfamiliar to you, it might be\nbetter to hold off applying to give yourself more time to prepare.\n\nMost candidates that we pass also complete part 2 fully in the time of the\ninterview. Strong candidates finish part 3 with a complete solution, but not\nfinishing this part doesn’t mean that you will be rejected! As said above, it’s\nwhat happens during the interview that really counts, not the result at the end.\n\nSharing questions afterwards\n\nWe are interested in seeing you approach a difficult problem and walk through\nthe process of deriving a solution with you. If you already know the question in\nadvance (either by finding it online, or by talking to friends who have\npreviously applied), it reduces the efficacy of the interview: We now only get\nto test if you can write code for a problem you’ve already understood and\nsolved, which is a small subset of things we are interested in learning about.\nFor example, if you’ve read through this post, we wouldn’t learn much from\nasking you Memo!\n\nThus we would like to ask you not to share the interview question with anybody\nelse (in person or online).\n\nWhat happens next\n\nWe usually try to get back to you about the outcome of the interview within one\nweek. During busy periods this might take a bit longer, but not hearing from us\nafter 10 days is unexpected and you should definitely poke us.\n\nIf the interview went well, we invite you to come to one of our offices (New\nYork, London or Hong Kong) for a day of onsite interviews. These proceed in much\nthe same way as the phone interview – all questions are technical.\n\nReady to apply?\n\nIf you read this post and feel ready to apply, then simply go\nhere. We have a very\nstraightforward application process – all we need is your resume or CV. If you\nalready have an offer from another firm and are worried about timing please let\nus know early in your process! You can also read our thoughts and experiences\nwith exploding\noffers, something we\ndon’t do but know many other companies use as a recruitment tool. We look\nforward to speaking with you!\n",
        "url"      : "https://blog.janestreet.com/what-a-jane-street-dev-interview-is-like/",
        "image"    : null,
        "topic"    :  ["technology","interviewing"] ,
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Jane Street Tech Talks: Verifying Puppet Configs",
        "date"     : "February 16, 2017",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 0,
        "content"  : "Our first Jane Street Tech Talk went really well!\nThanks to everyone who came and made it a fun event.\n\nNow it’s time for another. We’re planning for the series to feature a\ncombination of voices from inside and outside Jane Street. This one is of the\nlatter variety: on March 6th, Arjun\nGuha will be presenting On\nVerification for System Configuration\nLanguages, which is about using\nstatic verification techniques for catching bugs in\nPuppet configs.\n\nI’ve known Arjun for years, and he’s a both a good hacker and a serious academic\nwith a real knack for finding good ways of applying ideas from programming\nlanguages to systems problems. Also, he has excellent taste in programming\nlanguages…\n\nI hope you’ll come! You can sign up\nhere.\n",
        "url"      : "https://blog.janestreet.com/jane-street-tech-talks-verifying-puppet-configs/",
        "image"    : "https://blog.janestreet.com/jane-street-tech-talks-verifying-puppet-configs/untangling_puppet.jpg",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "How to Build an Exchange",
        "date"     : "January 11, 2017",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 1,
        "content"  : "UPDATE: We are full up. Tons of people signed up for the talk, and we’re\nnow at the limit of what we feel like we can support in the space. Thanks for\nall the interest, and if you didn’t get into this one, don’t worry, we have more\ntalks coming!\n\nWe’re about to do the first of what will hopefully become a series of public\ntech talks in our NY office.\n\nThe first talk is on February\n2nd, and is an overview of the architecture of a modern exchange. The talk is\nbeing given by Brian Nigito, and is inspired by our work on JX, a crossing\nengine built at Jane Street. But Brian’s experience is much broader, going all\nthe way back to the Island ECN, which in my mind marks the birth of the modern\nexchange.\n\nI did some work on JX, and one of the things I was struck by is the role that\nperformance plays in the design. In particular, JX uses a simple replication\nscheme based on reliable multicast that relies critically on the components of\nthe system having high throughput and low, deterministic latencies.\n\nThis is a situation where performance engineering is done not so much for\nreducing end-to-end latency, but instead to act as a kind of architectural\nsuperpower, making it possible to build systems in a simpler and more reliable\nway than would be possible otheriwse.\n\nAnyway, I think it’s a fascinating topic. If you’re interested in coming, you\ncan go\nhere\nto get the details and sign up. (We have a registration step so we can get\npeople through building security.)\n",
        "url"      : "https://blog.janestreet.com/how-to-build-an-exchange/",
        "image"    : "https://blog.janestreet.com/how-to-build-an-exchange/build_exchange.jpg",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "A brief trip through Spacetime",
        "date"     : "January 9, 2017",
        "authorId" : "lwhite",
        "author"   : "Leo White",
        "tags"     : [],
        "minsToRead" : 4,
        "content"  : "Spacetime is a new memory profiling facility for OCaml to help find space leaks\nand unwanted allocations. Whilst still a little rough around the edges, we’ve\nfound it to be a very useful tool. Since there’s not much documentation for\nusing spacetime beyond this\nreadme, I’ve\nwritten a little intro to give people an idea of how to use it.\n\nGenerating a profile\n\nAs an example of Spacetime in action let’s get a profile for the js_of_ocaml\ncompiler. First we’ll need a Spacetime-enabled OCaml compiler:\n\n$ opam switch 4.04.0+spacetime\n$ eval `opam config env`\n\n\n\nUsing this compiler we build the executable. In this case we just let opam build\nit for us:\n\n$ opam install js_of_ocaml\n\n\n\nNow we run the executable, using the environment variable\nOCAML_SPACETIME_INTERVAL to turn on profiling and specify how frequently\nSpacetime should inspect the OCaml heap (in milliseconds).\n\n$ OCAML_SPACETIME_INTERVAL=1000 js_of_ocaml core_kernel.cma\n\n\n\nExecutables with Spacetime enabled run more slowly and use more system memory\nthan usual, but the contents of the OCaml heap should be unaffected.\n\nRunning the executable produces a file in the current directory:\n\n$ ls | grep spacetime\nspacetime-8045\n\n\n\n8045 was the pid of the process. If your executable forks then there may be\nmultiple Spacetime files.\n\nNow that we have a Spacetime profile, we need to install the\nprof_spacetime profile viewer:\n\n$ opam switch 4.04.0\n$ eval `opam config env`\n$ opam install prof_spacetime\n\n\n\nNext we process the profile:\n\n$ prof_spacetime process -e .opam/4.04.0+spacetime/bin/js_of_ocaml spacetime-8045\nProcessing series...done\n\n$ ls | grep spacetime\nspacetime-8045\nspacetime-8045.p\n\n\n\nThis can take a while – a couple of minutes in this case. The -e option is\nused to pass prof_spacetime the executable that produced the profile. This\noption is not strictly necessary, but without it the profile wouldn’t include\nlocations for C code.\n\nWeb viewer\n\nNow we are ready to look at the profile, we’ll start by using the web viewer:\n\n$ prof_spacetime serve -p spacetime-8045.p\nProcessing series...done\nServing on 127.0.0.1:8080\n\n\n\nLive words\n\nNavigating to the appropriate address in a browser, we are greeted by an\nexciting colourful graph:\n\n\n\nThis graph shows the number of live words in the program over time, divided up\nby the source location at which the words were allocated. If we place our mouse\nover a section of the graph it will display the source location for that\nsection:\n\n\n\nand by clicking on a section we are taken to a new graph:\n\n\n\nThis graph contains only those live words allocated at the clicked source\nlocation. It is divided up by the source location of the call to the function\nwhich performed the allocation – i.e. the next frame up the backtrace. This is a\nkey feature of Spacetime: not only can you see that these words were allocated\nby List.map, you can see which call to List.map allocated them. By continuing to\nclick on these graphs we can get the entire backtrace when some words were\nallocated:\n\n\n\nClicking on the (top of stack) link returns us to the original graph.\n\nAllocated words\n\nThe live graphs (“Live words” and “Live blocks”) are useful for locating space\nleaks. For removing unwanted allocations, the allocation graph is more useful.\nBy clicking the “All allocated words” link we are shown a new graph:\n\n\n\nThis graph shows the cumulative total of allocations in the program, divided up\nby the source location of those allocations. Holding your mouse over a section\nwill display the location of that section. Clicking on a section will take you\nto a new graph containing only the allocations from that section, divided up by\nthe location of the next frame up the backtrace.\n\nTerminal viewer\n\nNow we’ll try the terminal viewer:\n\n$ prof_spacetime view -p spacetime-8045.p\n\n\n\nwhich launches us into a lambda-term style terminal view:\n\n\n\nThis shows the live words at a particular time (1.017844s) in the program’s\nexecution divided up by the source location at which the words were allocated.\nThe ← and → keys move between different points in time. The ↑ and ↓ keys select\ndifferent rows. Pressing return on a row loads a new view:\n\n\n\nThis shows the live words allocated at the selected source location (memory.c:\n552) divided up by the source location of the call to the function containing\nthe allocation – i.e. the next frame up the backtrace. Pressing backspace takes\nus back to the previous view.\n\nFinally, pressing tab switches between the three different modes: live words,\nlive blocks and allocated words. Use the q key to exit the viewer.\n",
        "url"      : "https://blog.janestreet.com/a-brief-trip-through-spacetime/",
        "image"    : "https://blog.janestreet.com/a-brief-trip-through-spacetime/spacetime.jpg",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "A solution to the ppx versioning problem",
        "date"     : "November 8, 2016",
        "authorId" : "jdimino",
        "author"   : "Jeremie Dimino",
        "tags"     : ["ocaml","ppx"],
        "minsToRead" : 12,
        "content"  : "Ppx is a preprocessing system for OCaml where one maps over the OCaml abstract\nsyntax tree (AST) to interpret some special syntax fragments to generate code.\n\nPpx rewriters get to work on the same AST definition as the compiler, which has\nmany advantages:\n\n\n  \n    The AST corresponds (almost) exactly to the OCaml language. This is not\ncompletely true as the AST can represent programs that you can’t write, but\nit’s quite close.\n  \n  \n    Given that the compiler and pre-processor agree on the data-type, they can\ncommunicate between each other using the unsafe [Marshal] module, which is\na relatively cheap and fast way of serializing and deserializing OCaml\nvalues.\n  \n  \n    Finally, the biggest advantage for the user is that the locations in the\noriginal code are exactly preserved, which is a requirement to get usable\nerror messages. This is not so great for the generated code, as the best one\ncan do is reuse some locations from the original source code and hope for\nthe best. In practice the user sometimes gets non-sensical errors, but this\nis a commonly accepted trade-off.\n  \n\n\nThere is however one drawback to all this, the compiler AST is not stable and\ncode using it is at the mercy of its evolution. We got lucky with the 4.04\nrelease of OCaml but the 4.03 one was quite disruptive. Even before releases,\nwhenever the AST definition changes during a development cycle, many ppx might\nnot be usable for a while, which make testing a lot harder.\n\nSeveral ideas have been flying around, such as adding a layer to convert between\ndifferent versions of the AST. While this would work, it has the drawback that\nyou need this layer for every variant of the AST. And when you want to make a\npatch modifying the AST, you’ll need to do the extra work of updating this layer\nfirst.\n\nIn this blog post we show how we managed to solve the ppx compatiblity problem\nin a way that improves the user experience and lets us produce releases that\ndon’t depend on ppx rewriters at all.\n\nWe did this work while working on Base, our upcoming standard library. In the\nend, it’s likely we’ll use only the third of the methods described below for\nBase, while the others will be used to improve the user experience with the rest\nof our packages.\n\nWhat do other code generators do?\n\nPpx is not the first system for generating code that is a mix of user-written\ncode and machine generated code. A typical class of generators that get it\nright, i.e. that preserve locations and are independant from the AST\ndefinition, are lexer/parser generators, and not only the ones distributed with\nthe compiler.\n\nLet’s take the example of lexer generators (parser generators work basically the\nsame way). The users write a series of rules, consisting of a regular expression\nand an action to take if the input matches it:\n\nrule token = parse\n| \"+\"  { PLUS  }\n| \"-\"  { MINUS }\n| '0'..'9'+ as s { INT (int_of_string s) }\n| \" \"* { token lexbuf (* skip blanks *) }\n\n\n\nThis code is written in a .mll file and the generator then produces a .ml\nfile with code for the lexing engine interleaved with the user written actions.\n\nIn order to keep the locations of the user-written code pointing to the right\nplace in the .mll file, the generator produces:\n\n# 42 \"lexer.mll\"\n         token lexbuf (* skip blanks *)\n\n\n\nThe OCaml compiler interprets the line starting with a # and updates its\ncurrent location to point to the begining of line 42 in file lexer.mll. This\nis called a line directive.\n\nTo go back into the generated code, the lexer generator produces:\n\n# 100 \"lexer.ml\"\n\n\n\nWhere 100 correspond to the real line number in lexer.ml.\n\nWith this method, when there is an error in the user-written code, it points to\nthe lexer.mll file, while when there is an error in the generated code it\npoints to the lexer.ml file. Even if the generated code might not be\nparticularly easy to understand, at least you get to see the real code the\ncompiler chokes on.\n\nAnother big advantage is that when using a debugger, you can follow the\nexecution through the generated code.\n\nCan we do the same for ppx?\n\nAt first glance, it seems that ppx rewriters work in a very different way, but\nthe result is the same: only parts of the file is generated and the rest is\ntaken as if from what the user wrote. In fact, compared to the lexer case, most\nof the resulting code is user written.\n\nThere is however some work to do to get the same result as with lexer\ngenerators. First you have to distinguish the generated code from the user code.\n\nIf you take a ppx rewriter as a black box, then the only way is to apply some\nkind of tree diff between the input and the output. In our ppx framework\nhowever, we know exactly what fragments of the AST are rewritten by plugins and\nwe know the rewriting is always local. This makes the job a lot simpler and\nprobably faster as well, so we chose to take advantage of this information.\n\nThe method\n\nIt works this way: while mapping the AST, we collect all the fragments of\ngenerated code with the location of the code they replace in the original file.\nAt the end we sort them in the order of the file and make sure there is no\noverlap. Every fragment is pretty printer to a string.\n\nWhat we end up with is a list of text substitutions: beginning position, end\nposition, replacemen text. The next step is to simply apply these substitutions\nto the original file. If you read the bog post about how we switched from camlp4\nto ppx, you’ll notice the resemblance here.\n\nThis is what the transformation looks like:\n\n(* ----- input ----- *)\ntype t = int [@@deriving sexp_of]\n\nlet f x = x + 1\n\nlet g x = [%sexp_of: t] x\n\n(* ----- output ----- *)\n# 1 \"foo.ml\"\ntype t = int [@@deriving sexp_of]\n# 4 \"foo.ml.pp\"\nlet sexp_of_t = sexp_of_int\n#2 \"foo.ml\"\n\nlet f x = x + 1\n\nlet g x =\n# 11 \"foo.ml.pp\"\nsexp_of_t\n# 5 \"foo.ml\"\n                        x\n\n\n\nThe result for [@@deriving sexp_of] is not bad at all. For code rewritten\ninside expressions, the result is not as good given that it breaks those\nexpressions up. But given than extensions are often sparse in our source files,\nthis is still acceptable.\n\nThis mode can be selected with ppx_driver based rewriter by passing the flag\n-reconcile.\n\nSolving the compatiblity problem\n\nWith this mode, one can first generate a .ml.pp file and feed that to the\ncompiler. Given that the concrete syntax of the language breaks much less often\nthan the internal AST definition, a working ppx is likely to work for a very\nlong time.\n\nWe’ll soon start releasing a separate package that snapshots one version of the\nlexer/parser/AST/printer of the OCaml compiler. This package will have its own\nrelease schedule and will typically be updated soon after each relase of OCaml.\nThis will give time for ppx authors to upgrade their code when it breaks while\nstill allowing people to try out the new compiler with their favorite packages.\n\nMode for release tarballs\n\nIn addition to the mode described above, ppx_driver has a second mode\n-reconcile-with-comments where the result is similar to the one with line\ndirectives expect than the generated code is enclosed with comment markers:\n\ntype t = int [@@deriving sexp_of]\n(* GENERATED CODE BEGIN *)\nlet sexp_of_t = sexp_of_int\n(* GENERATED CODE END *)\n\nlet f x = x + 1\n\nlet g x =\n(* GENERATED CODE BEGIN *)\nsexp_of_t\n(* GENERATED CODE END *)\n                        x\n\n\n\nThis mode is intended for release tarballs. One can replace all the files\nin-place by the pre-processed version using =-reconcile-with-comments=. The\nresult is readable and has the big advantage that you you don’t need to depend\non the ppx rewriter, which means the package is faster to install for users.\n\nJane Street packages will eventually move to this scheme, either for the next\nstable release or the one after that. One technical issue with this method is\nthat to take full advantage of it, the runtime support libraries of the various\nppx rewriters must be installable without the rewriter itself. Splitting the\npackages in opam is fine but splitting the repository is not desirable as often\nboth components make strong assumption about the other.\n\nFor Jane Street packages, we’ll need to update our release system so that it\nsupports generating two opam packages from one repository.\n\nppx as a verfication tool only\n\nWhile these new methods improve the ppx story in general, for Base we wanted to\ngo even further and allow users to build Base without the need for ppx at all,\nboth for the release and for the development versions. Not only to cut down the\ndependencies, but also to provide a better experience in general. For instance\nif you are working on a patched compiler and need the development version of\nBase, you shouldn’t need all the ppx rewriters that might not work for some\nreason.\n\nWe explored various bootstrap story, and while they worked they were not very\nnice, especially for such an important library. Its development and build\nprocesses should be straightforward.\n\nWe even looked into not using ppx at all. While this is OK for many ppx\nrewriters that are mostly syntactic sugars, it is more problematic for\n[@@deriving ...]. It’s not so much that the code is hard to write by hand,\nmost data-structures in Base are either simple datatypes or require and written\ncombinators anyway, but it is a pain to review. This code is very mechanical and\nyou have to make sure that the constant strings correspond to the\nconstructor/field names and other things where the machine can do much better\nthan a human.\n\nIn the end we found a solution to keep the best of both worlds, /i.e./ being\nable to build the original source code without pre-processing and avoid having\nto write and review this boilerplate code.\n\nThe idea is to use ppx in the same way that we write expect tests; the tool only\nchecks that what’s comes after the type-definition correspond to what the\nrewriters derive from it. In case of mismatch it produces a .corrected file just\nlike expect tests.\n\nWe are currently experimenting with this method for Base. It’s possible that\nwe’ll have some marker to delimit the end of the generated code. In the end the\ncode could look like this:\n\ntype t = A | B [@@deriving sexp_of]\n\nlet sexp_of_t = function\n  | A -&gt; Sexp.Atom \"A\"\n  | B -&gt; Sexp.Atom \"B\"\n\n[@@@end_of_derived_code]\n\n\n\nGiven that the compiler ignores attributes it doesn’t understand, this code\ncompiles just fine without any pre-processing.\n\nWhen running the ppx rewriter in this expect mode, the generated AST is matched\nagainst the source AST without taking locations into account, so that mean that\nyou can reformat the code as you wish and even add comments.\n\nThe challenge now is to update our ppx rewriters so that they produce code that\nwe are happy to show. Until now we didn’t focus too much on that, but we have a\ngood idea about how to do it. The plan is to move more of the logic of the\nvarious deriving system into proper functions instead of generating more code.\nNote that this is an improvement in general as proper functions are a lot easier\nto understand and maintain than code generators.\n\nConclusion\n\nIn this blog post we described a simple and clean method to decouple ppx\nrewriters from the release schedule of the compiler. This method has the\nadvantage that once the ppx is written is likely to work for a long time and\nespecially to work out of the box with development compilers.\n\nMoreover, this method has is better for users as errors point to the real code\nthe compiler sees and when debugging they can follow the execution through\ngenerated code without trouble.\n\nAll this is currently implemented in ppx_core/ppx_driver. Our github\nrepositories haven’t been updated in a while has the Base refactoring disrupted\nour public release process quite a bit. These new features should be published\nin the coming weeks and we’ll be part of the next stable release of our\npackages, planned for the beginning of December.\n",
        "url"      : "https://blog.janestreet.com/an-solution-to-the-ppx-versioning-problem/",
        "image"    : null,
        "topic"    :  ["technology","ocaml","ppx"] ,
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Observations of a functional programmer",
        "date"     : "October 27, 2016",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 1,
        "content"  : "I was recently invited to do the keynote at the Commercial Users of Functional\nProgramming workshop, a 15-year-old gathering which is\nattached to ICFP, the primary academic functional programming conference.\n\nYou can watch the whole video here, but\nit’s a bit on the long side, so I thought an index might be useful. (Also,\nbecause of my rather minimal use of slides, this is closer to a podcast than a\nvideo…)\n\nAnyway, here are the sections:\n\n\n  Intro\n  Two worlds, three ideas, about the\ndifference between dynamically and statically typed functional languages.\n  Still not popular About the state of\nfunctional languages in industry.\n  Lipstick on a pig How functional\nlanguages are in industry most often used to improve other languages rather\nthan being used directly.\n  The evidence isn’t in, or why it’s\nhard to find convincing experimental evidence as to the relative efficacy of\nprogramming languages.\n  You can’t value what you don’t\nunderstand About how hard it is to\nassess the utility of language features you haven’t really used.\n  The right tool for the organization,\nan alternative to “the right tool for the job”..\n  You broke it, you bought it, or, the\nplight of having been successful using a minority technology, and how to\ncontribute to the community you now depend on.\n  Academia isn’t academic, on the\nrelevance of academia’s ideas about programming.\n  Teach your children well Some\nthoughts on the place of functional programming in the university\ncurriculum, and how much tools matter there.\n  Use the advantage On evangelism\nversus just using FP.\n  Questions!\n\n\nHope you enjoy!\n",
        "url"      : "https://blog.janestreet.com/observations-of-a-functional-programmer/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "What the interns have wrought, 2016",
        "date"     : "September 13, 2016",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : ["internship"],
        "minsToRead" : 7,
        "content"  : "Now that the interns have mostly gone back to school, it’s a good time to look\nback at what they did while they were here. We had a bumper crop – more than 30\ndev interns between our London, New York and Hong Kong offices – and they\nworked on just about every corner of our code-base.\n\nIn this post, I wanted to talk about just one of those areas: building\nefficient, browser-based user interfaces.\n\nReally, that’s kind of a weird topic for us. Jane Street is not a web company,\nand is not by nature a place that spends a lot of time on pretty user\ninterfaces. The software we build is for our own consumption, so the kind of\nspit and polish you see in consumer oriented UIs are just not that interesting\nhere.\n\nBut we do care about having functional, usable user interfaces. The work we do\nis very data driven, and that rewards having UIs that are good at presenting\nup-to-date data clearly and concisely. And we want to achieve that while keeping\ncomplexity of developing these UIs low.\n\nHistorically, almost all of our UIs have been text-based. While I love the\ncommand line, it does impose some limitations. For one thing, terminals don’t\noffer any (decent) graphical capabilities. But beyond that obvious constraint,\ngetting all the pieces you need for a decent UI to work well in a terminal, from\nscrolling to text-entry widgets, requires a lot of work that just isn’t\nnecessary in a browser.\n\nSo this year, we’ve finally started pushing to make it easy for us to write\nbrowser-based applications, in particular relying on OCaml’s shockingly\ngood JavaScript back-end. This has allowed us\nto write web applications in OCaml, using our usual tools and libraries. As\nI’ve blogged about\npreviously, we’ve also been exploring\nhow to use Incremental, a\nframework for building efficient on-line computations, to make browser UIs that\nare both pleasant to write and performant.\n\nThat’s roughly where we were at the beginning of the summer: some good ideas\nabout what designs to look at, and a few good foundational libraries. So the\nreal story is what our interns, Aaron Zeng and Corwin de Boor, did to take us\nfarther.\n\nWeb Tables\n\nAaron Zeng’s project was to take all of the ideas and libraries we’d been\nworking on and put them to use in a real project. That project was web-tables,\na new user-interface for an existing tool developed on our options desk, called\nthe annotated signal publisher. This service provides a simple way for traders\nto publish and then view streams of interesting tabular data often based on\nanalysis of our own trading or trading that we see in the markets.\n\nThe publisher fed its data to Catalog, our internal pub-sub system. From\nCatalog, the data could be viewed in Excel, or in our terminal-based Catalog\nbrowser. But neither of these approaches worked as well or as flexibly as we\nwanted.\n\nEnter web-tables. What we wanted was pretty simple: the ability to display\ntabular data from the annotated signal publisher with customizable formatting,\nfiltering and sorting. This involved breaking a lot of new ground, from figuring\nout how do the sorting and filtering in an efficient and incremental way, to\nfixing performance issues with our RPC-over-websockets implementation, to\nfiguring out a deployment and versioning story that let people easily create and\ndeploy new views of their data.\n\nOne of the great things about the project is how quickly it was put into use.\nThe options guys started using web-tables before the project was even really\nfinished, and there was a tight feedback loop between Aaron and his mentor Matt\nRussell, who was using the tool on a daily basis.\n\nOptimizing rendering\n\nAaron’s web-tables work used\nincr_dom, a small framework that\nsets up the basic idioms for creating UIs using Incremental. As part of that\nwork, we discovered some limitations of the library that made it hard to hit the\nperformance goals we wanted to. Corwin de Boor’s project was to fix those\nlimitations.\n\nThe key to building an efficient UI for displaying a lot of data is figuring out\nwhat work you can avoid. To this end, Corwin wanted to build UIs that logically\ncontained thousands or even millions of rows, while only actually materializing\nDOM nodes corresponding to the hundred or so rows that are actually in view.\n\nIn order to figure out which nodes to render, he had to first figure out which\nnodes would be visible, based on the location of the browser’s viewport. This in\nturn required a way of looking up data-points based on where that data would be\nexpected to render on the screen.\n\nCorwin did this by building a data structure for storing the expected height of\neach object that could be rendered, while allowing one to query for a node based\non the sum of the heights of all the nodes ahead of it. He did this by taking an\nexisting splay tree library and rewriting it so it could be parameterized with a\nreduction function that would be used to aggregate extra information along the\nspine of the splay tree. By integrating the reduction into splay tree itself,\nthe necessary data could be kept up to date efficiently as the splay tree was\nmodified.\n\nCorwin also spent a lot of time improving incr_dom itself, taking inspiration\nfrom other systems like React and the\nElm language. We even corresponded a bit with Jordan\nWalke and Evan Czaplicki, the authors of React and Elm respectively.\n\nOne thing that came out of this was a neat trick for making the incr_dom API\ncleaner by using a relatively new feature of OCaml called open\ntypes. The details are a little\ntechnical (you can see the final result\nhere\nand\nhere),\nbut I think what we ended up with is a bit of an advance on the state of the\nart.\n\nThere were a lot of other bits and pieces, like improving the handling of\nkeyboard events in js_of_ocaml, creating a new incremental data-structure\nlibrary called incr_select for\nmore efficiently handling things like focus and visibility, and restructuring\nthe incr_dom APIs to make them simpler to understand and open up new\nopportunities for optimization.\n\nBy the end, Corwin was able to build a demo that smoothly scrolled over a\nmillion rows of continuously updating data, all while keeping update times below\n5ms. We’re now looking at how to take all of this work and feed it into real\napplications, including Aaron’s work on web-tables.\n\nSound like fun?\n\nIf this sounds like interesting stuff, you should consider applying for a\nsummer internship with us!\n\nJane Street internships are a great learning experience, and a lot of fun to\nboot. You even get a chance to travel: interns get to visit Hong Kong, London or\nNew York (depending on where they started!) as part of their summer.\n\nAnd the projects I described here are really just a taste. Here are some other\nprojects interns worked on this summer:\n\n\n  Building a low-latency order router.\n  Adding data representation optimizations to the OCaml compiler\n  A service for simulating a variety of network failures, to make it easier to\ntest distributed applications.\n  Making it possible to write Emacs extensions in\nOCaml (this was actually Aaron Zeng’s\nsecond project)\n\n\nand that’s still just a small sample. One thing I love about the work at Jane\nStreet is the surprising variety of problems we find ourselves needing to solve.\n",
        "url"      : "https://blog.janestreet.com/what-the-interns-have-wrought-2016/",
        "image"    : null,
        "topic"    :  ["technology","internship"] ,
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Unraveling of the tech hiring market",
        "date"     : "August 31, 2016",
        "authorId" : "dpowers",
        "author"   : "David Powers",
        "tags"     : ["interviewing"],
        "minsToRead" : 3,
        "content"  : "Recruiting talented people has always been challenging.\n\nIn some years that meant competing with a hot new company that aggressively\ncourted every fresh graduate with promises of stock options and IPO glory.  In\nother years there wasn’t a specific company so much as an entire rising industry\nlooking for people (I’m looking at you cloud services, driverless cars, and\npeer-to-peer sharing).  Either way, we understood the yearly back and forth.\n Our job was to explain to candidates how we stacked up, and more importantly,\nwhy a career at Jane Street might be the right choice for many of them.\n\nBut this year I got to learn a new name for a new challenge.  “Unraveling”.\n\nI first encountered it in a book I was reading for fun: “Who Gets What, and\nWhy”, by the Nobel Prize-winning economist Alvin Roth.  He does a lovely job\nexplaining the idea of a matching market.  In a matching market each person\nwants only one of each item, each item is unique, and each item can be given to\nat most one person at a time.  Jobs are a classic matching market, and just like\nany market, matching markets can work well, or poorly.\n\nUnraveling is one of the primary things that makes a matching market fail.  When\na market unravels  matches start to happen earlier and earlier, to the point\nwhere people no longer get a complete view of their options.  In the book Roth\nrelates the story of a person who stepped off of a plane to find three\nvoicemails on his phone.  The first offered him a job, the second urged him to\nrespond soon, and the last rescinded the offer because he hadn’t responded\nquickly enough.\n\nWe call them exploding offers, and this year they have gotten completely out of\nhand as companies race to the bottom in their efforts to recruit the next wave\nof interns and fresh graduates.\n\nColleges try to impose deadline limits explicitly to stop unraveling, and in the\npast these have largely been honored.  The cheating and fudging, such as it was,\nwas kept to the fringes.  But this year it seems like the seal is broken, and\nwe’ve seen major companies delivering internship and full-time offers with 2\nweek (and less) hard deadlines.  Other companies now routinely deliver expiring\nbonus offers for signing early.  Many of these offers circumvent or outright\nbreak the guidelines set down by schools, and if past matching markets are a\nmodel for this one, next year will come with even earlier offers and worse\nconditions.\n\nThis unraveling has been the subject of a lot of discussion, both internally at\nJane Street and with the various schools we recruit at, who see it – rightly –\nas bad for their students.  How can someone make a thoughtful decision about\nwhere they want to build a career without the time to interview at more than one\nor two places?  Unfortunately, most of this discussion is out of the public\nlight, and so the unraveling continues.\n\nWe can’t control the actions of others, but we also don’t have to follow the\nherd, so we’d like to be clear:\n\nJane Street is committed to making sure that you have the time and information\nyou need to decide on an offer from us.  Our offer letters do have good-until\ndates as a matter of professional practice, but we try to work with every\ncandidate to choose a date that works for them.  We are also happy to extend the\ndate if something unexpected comes up, or, frankly, if someone just needs more\ntime.\n\nChoosing where to start your career is a big decision and we hope you have the\ntime to make a good one.\n",
        "url"      : "https://blog.janestreet.com/unraveling/",
        "image"    : null,
        "topic"    :  ["technology","interviewing"] ,
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Do you love dev tools? Come work at Jane Street.",
        "date"     : "August 30, 2016",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 1,
        "content"  : "In the last few years, we’ve spent more and more effort working on developer\ntools, to the point where we now have a tools-and-compilers group devoted to the\narea, for which we’re actively hiring.\n\nThe group builds software supporting around 200 developers, sysadmins and\ntraders on an OCaml codebase running into millions of lines of code. This\ncodebase provides the foundation for the firm’s business of trading on financial\nmarkets around the world.\n\nSoftware that the group develops, much of which is written in-house, includes:\n\n\n  build, continuous integration and code review systems;\n  preprocessors and core libraries;\n  editor enhancements and integration.\n\n\nThe group also devotes significant time to working on the OCaml compiler itself,\noften in collaboration with external parties, with work being released as open\nsource. Recent work includes work on the Flambda optimization framework and the\nSpacetime memory profiler.\n\nCandidates need to be familiar with a statically typed functional language and\npossess some amount of experience (within industry or otherwise) in this kind of\ninfrastructure.\n\nWe’re looking for candidates for both our New York and London office. Benefits\nand compensation are highly competitive.\n\nIf you are interested, please email tools-and-compilers-job@janestreet.com\nwith a CV and cover letter.\n",
        "url"      : "https://blog.janestreet.com/do-you-love-dev-tools-come-work-at-jane-street/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Let syntax, and why you should use it",
        "date"     : "June 21, 2016",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 12,
        "content"  : "Earlier this year, we created\na ppx_let, a PPX rewriter that\nintroduces a syntax for working with monadic and applicative libraries like\nCommand, Async, Result and Incremental. We’ve now amassed about six months of\nexperience with it, and we’ve now seen enough to recommend it to a wider\naudience.\n\nFor those of you who haven’t seen it, let syntax lets you write this:\n\nlet%bind &lt;var&gt; = &lt;expr1&gt; in &lt;expr2&gt;\n\n\n\ninstead of this:\n\n&lt;expr1&gt; &gt;&gt;= fun &lt;var&gt; -&gt; &lt;expr2&gt;\n\n\n\nwith analogous support for the map operator, via let%map. The choice of monad\nis made by opening the appropriate Let_syntax module, e.g., you might open\nDeferred.Result.Let_syntax, or Incr.Let_syntax. Note that Async.Std now\nopens Deferred.Let_syntax by default.\n\nThere’s also support for match statements, e.g.:\n\nmatch%bind &lt;expr0&gt; with\n| &lt;pattern1&gt; -&gt; &lt;expr1&gt;\n| &lt;pattern2&gt; -&gt; &lt;expr2&gt;\n\n\n\nis equivalent to:\n\n&lt;expr0&gt; &gt;&gt;= function\n| &lt;pattern1&gt; -&gt; &lt;expr1&gt;\n| &lt;pattern2&gt; -&gt; &lt;expr2&gt;\n\n\n\nThere’s also support for parallel binds and maps, using the and syntax. So\nthis:\n\nlet%map &lt;var1&gt; = &lt;expr1&gt; and &lt;var2&gt; = &lt;expr2&gt; in &lt;expr3&gt;\n\n\n\nis (roughly) equivalent to this:\n\nmap3 &lt;expr1&gt; &lt;expr2&gt; ~f:(fun &lt;var1&gt; &lt;var2&gt; -&gt; &lt;expr3&gt;)\n\n\n\nThis pattern generalizes to arbitrarily wide maps. It’s implemented using map\nand the both operator, which sacrifices some performance in exchange for\ngenerality, vs the explicit mapN operators.\n\nAdvantages\n\nMy experience with the new syntax has been quite positive. Here’s my summary of\nthe wins.\n\nParallel binds\n\nFor libraries like Command.Param and Incremental, where multi-way map\nfunctions (like `map2` and `map3`) are important, it’s been a\npretty big win in terms of the comprehensibility of the resulting code. This\ntends to be the case for applicatives like Command.Param, which are just\nmonads without bind. The big advantage is that by writing:\n\nlet%map x1 = some_very long expression\nand     x2 = some_other long expression\nand     x3 = yet another_thing\nin\nx1 + x2 / x3\n\n\n\nwe get to put the variable names directly next to the expressions they’re being\nbound. Using an explicit mapN operator, the result is more awkward:\n\nmap3 \n(some_very long expression)\n(some_other long expression)\n(yet another_thing)\n~f:(fun x1 x2 x3 -&gt; x1 + x2 / x3)\n\n\n\nThis is error prone, since it’s easy to mix up the variables and the\nexpressions. To avoid the corresponding issue in the original Command library,\nwe used some fancy combinators and the dreaded step operator, leading to some\nhard to understand code. The let-syntax equivalents are materially easier to\nuse.\n\nVariables first\n\nUsing a let-like syntax lets you put the variable before the definition, which\nfollows the pattern of ordinary OCaml code, and makes it a bit easier to read.\nThis cleans up some otherwise awkward patterns that are pretty common in our\ncode. In particular, instead of this:\n\nbegin\n&lt;expr1&gt;;\nlet &lt;var&gt; = &lt;expr2&gt; in\n&lt;expr3&gt;\nend\n&gt;&gt;= fun meaningful_variable_name -&gt;\n&lt;expr4&gt;\n\n\n\nYou can write this:\n\nlet%bind meaningful_variable_name =\n&lt;expr1&gt;;\nlet &lt;var&gt; = &lt;expr2&gt; in\n&lt;expr3&gt;\nin\n&lt;expr4&gt;\n\n\n\nwhich flows a bit more naturally, in part because the meaningful variable name\ncomes first, and in part because the extra begin and end are dropped.\n\nConnecting bind to let\n\nLet binds are a lot like monadic binds, even before you add in any special\nsyntax. i.e., this\n\n&lt;expr1&gt; &gt;&gt;= fun x -&gt; expr2\n\n\n\nis a lot like this.\n\nlet x = &lt;expr1&gt; in &lt;expr2&gt;\n\n\n\nThis is why monads are sometimes described as “programmable\nlet-binds” (or, relatedly, “programmable semicolons”, which\nare just let-binds with a unit argument.)\n\nI’ve found this to be a useful analogy in understanding monads, and the analogy\nis made clearer with let syntax. We have some preliminary reports of this making\nmonadic code more approachable for beginners, which lines up with my intuition.\n\nThe similarity between ordinary lets and monadic lets also makes diffs easier to\nread. e.g., in Async, if some function goes from being synchronous to\ndeferred, the change at the call point would now be from this\n\nlet x = some_synchronous_thing () in\nmore things\n\n\n\nto this.\n\nsome_asynchronous_thing ()\n&gt;&gt;= fun () -&gt;\nmore things\n\n\n\nWith let-syntax, we would instead change it to this.\n\nlet%bind x = some_asynchronous_thing () in\nmore things\n\n\n\ni.e., the only thing that would change would be the addition of %bind. The\nresulting diff is more targeted, making the substance of the change a bit easier\nto see, making refactoring that adds or remove blocking easier to do and\nunderstand.\n\nDisadvantages\n\nIt’s not all wine and roses. There are some downsides to let-syntax:\n\nIt’s new and different\n\nEnough said.\n\nIt’s kinda ugly\n\nThis is a matter of taste, but I’ve heard some distaste for the percent sign\nitself. That’s something forced on us by PPX, but I don’t exactly disagree.\n\nAlso, the %bind and %map are a little wordy. There’s been some talk of\nadding the ability to define alternate let syntaxes in OCaml proper, which would\nallow you to write something like this.\n\nlet* x = some_asynchronous_thing () in\nlet* y = some_other_thing () in\nlet+ z = a_third thing in\nx + y + z\n\n\n\nHere, let* would be equivalent to let%bind, and let+ is equivalent to\nlet%map. Again, it’s not clear to me that this would all in be a win.\n\nI personally find the new syntax all in less ugly than using infix operators\neverywhere, but again, tastes vary.\n\nUnit binds aren’t great\n\nIn particular, because we have no “monadic semicolon”; in the syntax,\nyou have to go from this:\n\n&lt;expr1&gt;\n&gt;&gt;= fun () -&gt;\n&lt;expr2&gt;\n\n\n\nto\n\nlet%bind () = &lt;expr1&gt; in\n&lt;expr2&gt;\n\n\n\nwhich is not ideal, since it’s not parallel to the normal semicolon syntax for\nthis outside of the monad. We’ve looked at making it possible to do something\nlike:\n\n&lt;expr1&gt; ;%bind\n&lt;expr2&gt;\n\n\n\nwhich would be more parallel with ordinary OCaml syntax, but that’s not yet\npossible, and it’s not clear it’s a net win.\n\nIt changes how you think about interleaving in Async\n\nIn Async, when you write:\n\nload_something ()\n&gt;&gt;= fun x -&gt;\nprocess_thing x\n\n\n\nyou can think of the point where interleaving can happen as the place where the\nbind operator is found. With let-syntax:\n\nlet%bind x = load_something () in\nprocess_thing x\n\n\n\nthe location is different, and somewhat less obvious. My experience has been\nthat this was easy to adjust to and hasn’t tripped me up in practice, but it’s a\nconcern.\n\nIdioms\n\nA few thoughts on how to use let syntax effectively.\n\nLet syntax for variables, infix for point-free\n\nOne might wonder whether there’s any use for the infix operators once you are\nusing Let_syntax. I believe the answer is yes. In particular, the style we’ve\nadopted is to use let syntax when binding a variable.\n\nlet%bind x = some_expression in\n\n\n\nand infix operators when going point-free, i.e., when not binding variables.\n\nlet v = some_function x &gt;&gt;| ok_exn in\n\n\n\nOne special case of this is binding unit, where some people prefer to use the\nfollowing pattern, since we don’t have a nice monadic semi-colon yet.\n\nlet%bind x = some_operation in\nsome_other_operation &gt;&gt;= fun () -&gt;\nlet%bind y = yet_another_thing () in\na_final_thing y\n\n\n\nrather than:\n\nlet%bind x = some_operation in\nlet%bind () = some_other_operation in\nlet%bind y = yet_another_thing () in\na_final_thing y\n\n\n\nMixing monads\n\nOne change we made recently was to add the return function and the monadic infix\noperators to the Let_syntax module that one opens to choose a monad. This has\nthe useful property of causing one to basically switch cleanly from one monad to\nanother when you open the Let_syntax module. Mixing multiple monads in the\nsame scope is hard to think about.\n\nCommand.Param and Deferred\n\nA few interesting cases that come up are mixing the Command.Param syntax with\nthe Deferred syntax. This one is pretty easy to solve, because you don’t\ntypically need to mix them together, really. It’s just that in the body of the\ncommand, you often want Deferred, but in the definition of the command line\nparser, you want to use Command.Param. This can be handled by doing a local\nopen of Command.Param.Let_syntax or Deferred.Let_syntax as necessary.\n\nDeferred and Deferred.Result\n\nA more complicated case is choosing between the Deferred and Deferred.Result\nmonads. In Async, there are infix operators that let you use both sets of bind\nand map operators (basically, with question-marks at the end of the ordinary\ninfix operators for the Deferred.Result operators.)\n\nMixing these operators together in a single scope can be a little awkward, often\nleaving people to add and remove question-marks until things compile. With let\nsyntax, you really have to pick a single monad, which is easier to read, but\nthen requires some changes in behavior. In particular, you often need to move\nthings from one monad to another. For example, if you’re in the Deferred monad\nand get a result of type Deferred.Or_error.t, you might want to do something\nlike this:\n\nlet open Deferred.Let_syntax in\nlet%bind v = some_operation x y &gt;&gt;| ok_exn in\n\n\n\nHere, mapping over ok_exn will take the error and raise it, if necessary.\nSimilarly, if you’re using an operation that’s in the ordinary Deferred monad\nbut you’re operating in the Deferred.Result monad, you might want to lift that\noperation up, i.e.:\n\nlet open Deferred.Result.Let_syntax in\nlet%bind v = some_other_operation x y |&gt; Deferred.map ~f:(fun x -&gt; Ok x) in\n\n\n\nThis is something of a mouthful, so we just added the Deferred.ok function, so\non our latest release you can write:\n\nlet open Deferred.Result.Let_syntax in\nlet%bind v = some_other_operation x y |&gt; Deferred.ok in\n\n\n\nThis idiom is useful is useful whether or not you’re using let syntax.\n",
        "url"      : "https://blog.janestreet.com/let-syntax-and-why-you-should-use-it/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "ppx_core: context-free rewriters for better semantics and faster compilation",
        "date"     : "May 23, 2016",
        "authorId" : "jdimino",
        "author"   : "Jeremie Dimino",
        "tags"     : [],
        "minsToRead" : 9,
        "content"  : "At Jane Street, we have always been heavy users of pre-processors, first with\ncamlp4 and now ppx. Pre-processing makes the infrastructure a bit more complex,\nbut it save us a lot of time by taking care of a lot of tedious boilerplate code\nand in some case makes the code a bit prettier.\n\nAll in all, our standard set has 19 rewriters:\n\n\n  ppx_assert\n  ppx_bench\n  ppx_bin_prot\n  ppx_compare\n  ppx_custom_printf\n  ppx_enumerate\n  ppx_expect\n  ppx_fail\n  ppx_fields_conv\n  ppx_here\n  ppx_inline_test\n  ppx_js_style*\n  ppx_let\n  ppx_pipebang\n  ppx_sexp_conv\n  ppx_sexp_message\n  ppx_sexp_value\n  ppx_typerep_conv\n  ppx_variants_conv\n\n\nThese rewriters fall into 3 big categories:\n\n\n  type driven code generators: ppx_sexp_conv, ppx_bin_prot, …\n  inline tests and benchmarks: ppx_inline_test, ppx_expect, ppx_bench\n  convenience: ppx_sexp_value, ppx_custom_printf, …\n\n\nThe first category is the one that definitely justify the use of pre-processors,\nuntil we get something better in the language itself.\n\nWith such a high number of code transformations, there is an important question\nof how they compose with each other. For instance what happens if the output of\na ppx generates some code that is rewritten by another ppx?\n\nSince the switch from camlp4 to ppx a year ago really, category 1 transformers\nwere handled all at once as a whole-AST mapping pass by ppx_type_conv while\nall the other one were implemented as separate passes. With the previous list\nthat means 13 passes, given that 7 of them are ppx_type_conv plugins. This\nmeans that the output depended on the order in which the various passes were\napplied.\n\nIntuitively, one would think it’s not a big deal, given that it is quite rare\nfor a ppx to produce code that would be rewritten by another ppx. Still we ran\ninto several issues over time:\n\n\n  Some ppx rewriters – such as ppx_inline_test that rewrites [%%test ...]\nextensions – captures a pretty-print of their payload, for debugging\npurposes. Depending on when ppx_inline_test is applied, the payload won’t\nbe the same, as it might have been expanded by other ppx rewriters, which is\nconfusing for users.\n  A few ppx rewriters interpret the payload of a specific extension point as a\nDSL to be interpreted. This is the case of ppx_sexp_value and\nppx_sexp_message. If another ppx messed with the payload before them, the\nresult will be unspecified. We had such an issue with ppx_here: inside\n[%sexp ...], [%here] is interpreted by ppx_sexp_value and\nppx_sexp_message and produces \"&lt;filename&gt;:&lt;line&gt;:&lt;column&gt;\", while\noutside it is interpreted by ppx_here and produces a record of type\nLexing.position\n\n\nInitially we dealt with these issues by using a specific order in the default\nset of rewriters, but that’s more of a dirty hack than a real solution. Often\ndevelopers are not aware of this and might end up using a wrong order when using\na custom set of rewriters. Moreover this worked because we have control over the\norder with Jenga, but in opensource packages using oasis, ocamlbuild and\nocamlfind we have no control over the final ordering.\n\nBut apart from the semantic problems, there is an obvious performance problem:\nall the transformations are local, but still we are doing 12 passes over the\nentire AST. What a waste of CPU time!\n\nThe different ways of composing ppx rewriters\n\nBefore jumping into the subject of this post, we recall a few of the various\nmethods one can use to compose ppx rewriters.\n\nVia separate process\n\nThe default method, that was adopted early by the community is to define each\ntransformation as a separate executable. To compose them, one just has to call\nall the executables one by one. The main advantage of this approach is that each\ntransformation is a black box and can do whatever dirty hacks it wants.\n\nThis is what you get when you are using a ppx by just putting the package name\nin your build system without doing anything special.\n\nVia a driver\n\nAnother approach, that we developed at Jane Street is to link all the\ntransformations into a single executable. For this to work properly all\ntransformations must use the same framework. Technically they all register\nthemselves with ppx_driver via a call to Ppx_driver.register_transformation.\nPpx_driver is then responsible for composing them.\n\nThere are several advantages of the second approach: since ppx_driver has\nknowledge of all transformations, it can do extended checks such as making sure\nthat all attributes have been interpreted. This helps detect typos, which in\npractice saves a lot of debugging time. But what really interest us in this post\nis that it can use more clever composition methods.\n\nCode transformations using ppx_driver can still export a single executable\ncompatible with the first method, that’s why all Jane Street ppx rewriters can\nbe used with both methods.\n\nppx_driver has an ocamlbuild plugin\nto simplify building custom drivers.\n\nContext free transformations\n\nGiven that all transformations are local, it was clear that they should be\ndefined as such; i.e. if all you want to do is turn [%sexp \"blah\"] into\nSexp.Atom \"blah\", you don’t need to visit the whole AST yourself. You just\nneed to instruct whatever framework you are using that you want to rewrite\n[%sexp ...] extension points.\n\nContext-free extension expander\n\nWe started with this idea a few month ago by adding an API in ppx_core to\ndeclare context-free extension expanders. For instance, this shows how you would\ndeclare a ppx that interpret an extension [%foo ...] inside expressions:\n\nopen Ppx_core.Std\n\nlet ext =\n  Extension.declare \"foo\" Expression\n    Ast_pattern.(...)\n    (fun ~path ~loc &lt;parsed-payload...&gt; -&gt; &lt;expansion&gt;)\n\nlet () = Ppx_driver.register \"foo\" ~extensions:[ext]\n\n\n\nThe Ast_pattern.(...) bit describes what the extension expects as its payload.\n\nSince ppx_driver knows about all the local extension expanders, it can expand\nthem all in one pass over the AST. Moreover it can detect ambiguities and error\nout in such cases.\n\nThere was a choice to make as to whether rewrite the AST in a bottom-up or\ntop-down manner. We choose top-down, to allow extension expanders to interpret\ntheir payload before anyone else, and so they can correctly implement a DSL.\n\nThis solved most of the initial issues and reduced the number of passes to 7:\n\n\n  all extension expanders\n  ppx_type_conv\n  ppx_custom_printf\n  ppx_expect\n  ppx_fail\n  ppx_pipebang\n  ppx_js_style\n\n\nppx_expect wasn’t initially defined as a context-free extension expander for\ntechnical reasons.\n\nMaking everything context-free\n\nRecently we went even further and added a Context_free module to Ppx_core to\ncover all of our transformations. It doesn’t support all possible rewriting but\nsupport enough to implement a lot of common ones:\n\n\n  context-free extension expanders\n  some specific support to implement type-driven code generators\n  support for ppx rewriters interpreting a function application at\npre-processing time, such as ppx_custom_printf that interprets\n!\"&lt;format&gt;\"\n\n\nWith this we reduced the number of passes to only 2:\n\n\n  context free transformations\n  ppx_js_style\n\n\nppx_js_style is still done in a separate pass for simplicity. It is run last\nto ensure we don’t generate code that doesn’t match our coding rules.\n\nNow, whatever order developers specify their ppx in their build system, they\nwill get the exact same output.\n\nSeeing the exact passes\n\nPpx_driver got a new option to print what passes it will execute, for instance\nwith ppx-jane which a standard driver containing all of the Jane Street ppx\nrewriters linked in (available in the ppx_jane package in opam):\n\n$ ppx-jane -print-passes\n&lt;builtin:freshen-and-collect-attributes&gt;\n&lt;bultin:context-free&gt;\n&lt;builtin:check-unused-attributes&gt;\n&lt;builtin:check-unused-extensions&gt;\n\n$ ppx-jane -print-passes -no-check\n&lt;bultin:context-free&gt;\n\n\n\nSafety checks are implemented as additional passes, that’s why we see more than\none passes by default.\n\nNumbers\n\nNo performance comparison was done when introducing context free extension\nexpanders, but we did some for the second stage, when we changed all ppx\nrewriters to use Context_free; processing a file with the resulting driver was\ntwice as fast (check passes included).\n\nBut how does this compare to the more traditional method of running each\nrewriter in a separate process? To find out we did some benchmark by taking one\nof the biggest ml file in core_kernel (src/command.ml) and comparing the two\nmethods. We put a type error on the first line to be sure we stop just after\npre-processing.\n\nFor reference, following are the numbers for calling ocamlfind ocamlc on the\nfile with no pre-processing:\n\n$ time ocamlfind ocamlc -c command.ml\nFile \"command.ml\", line 1, characters 12-15:\nError: This expression has type char but an expression was expected of type\n         int\n\nreal 0m0.022s\nuser 0m0.016s\nsys  0m0.006s\n\n\n\nTo preprocess the file with ppx_jane as a single driver executable, one just\nhas to pass one -ppx option, or a -pp option given that ppx_driver can be used\neither as a -ppx either as a -pp:\n\n# via -ppx\n$ time ocamlfind ocamlc \\\n  -ppx 'ppx-jane -as-ppx -inline-test-lib core -inline-test-drop -bench-drop' \\\n  -c command.ml 2&gt; /dev/null\n\nreal 0m0.095s\nuser 0m0.074s\nsys  0m0.020s\n\n# via -pp\n$ time ocamlfind ocamlc \\\n  -pp 'ppx-jane -dump-ast -inline-test-lib core -inline-test-drop -bench-drop' \\\n  -c command.ml 2&gt; /dev/null\n\nreal 0m0.091s\nuser 0m0.066s\nsys  0m0.024s\n\n# via -pp, with checks disabled\n$ time ocamlfind ocamlc \\\n  -pp 'ppx-jane -dump-ast -no-check -inline-test-lib core -inline-test-drop -bench-drop' \\\n  -c command.ml 2&gt; /dev/null\n\nreal 0m0.070s\nuser 0m0.051s\nsys  0m0.018s\n\n# via -pp, without merging passes\n$ time ocamlfind ocamlc \\\n  -pp 'ppx-jane -dump-ast -no-merge -inline-test-lib core -inline-test-drop -bench-drop' \\\n  -c command.ml 2&gt; /dev/null\n\nreal 0m0.229s\nuser 0m0.206s\nsys  0m0.022s\n\n\n\nUsing the other method turned out to be quite painful, given that the various\nppx cannot share command line arguments, they had to be specified more than\nonce:\n\n$ time ocamlfind ocamlc -package ppx_jane \\\n  -ppxopt \"ppx_inline_test,-inline-test-lib blah -inline-test-drop\" \\\n  -ppxopt \"ppx_bench,-inline-test-lib blah -bench-drop\" \\\n  -ppxopt \"ppx_expect,-inline-test-lib blah\" \\\n  -c command.ml 2&gt; /dev/null\n\nreal 0m0.339s\nuser 0m0.233s\nsys  0m0.098s\n\n\n\nSo without surprise the single pass in a single executable method is really a\nlot faster.\n\nAvailability\n\nThis code is available on github. The context-free extension point API is\nalready available in opam. The newer one is only in the git repository for\nppx_core and ppx_driver. You can try them out by using our development opam\nrepository. You should have a\nlook at this if you care about how your rewriters are composed and/or if you\ncare about compilation speed.\n\n\n  ppx_js_style is not currently released; it is an internal ppx that we use\nto enforce our coding standards.\n\n",
        "url"      : "https://blog.janestreet.com/ppx_core-context-free-rewriters-for-better-semantic-and-faster-compilation/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Seven Implementations of Incremental",
        "date"     : "March 9, 2016",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 0,
        "content"  : "We finally got a decent recording of one of my favorite talks. This one is about\nour Incremental library (which I\nwrote about here), and in particular about the\nstory of how we got to the present, quite performant, implementation.\n\nIt’s not clear from the talk, but the work on this library wasn’t done by me:\nThe initial version was implemented by Stephen Weeks and Milan Stanojevic, and\nmost of the intermediate versions were implemented by Nick Chapman and Valentin\nGatien-Baron.\n\nThe high quality org-mode slides, though, are all me.\n",
        "url"      : "https://blog.janestreet.com/seven-implementations-of-incremental/",
        "image"    : "https://blog.janestreet.com/seven-implementations-of-incremental/ron-photo.jpg",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "OCaml 4.03: Everything else",
        "date"     : "March 1, 2016",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 9,
        "content"  : "In my previous post I wrote about Flambda, which is the single\nbiggest feature coming to OCaml in this release. In this post, I’ll review the\nother features of 4.03 that caught my eye.\n\nInline records\n\nVariants are my favorite thing about OCaml, and in this release, they’re getting\nbetter. You’ve always been able to define variants with multiple arguments,\ne.g.:\n\ntype shape =\n  | Circle of float * float * float\n  | Rect of float * float * float * float\n\n\n\nBut, as with this example, it can sometimes be a little hard to figure out what\nthe meaning of the individual fields are, since they don’t have labels. We can\nmake this better by replacing our multi-argument variants with single argument\nvariants containing approriately named records, as follows.\n\ntype circle = { center_x: float; \n                center_y: float; \n                radius: float;\n              }\n\ntype rect = { x_lo: float; \n              y_lo: float; \n              x_hi: float; \n              y_hi: float;\n            }\n\ntype shape =\n  | Circle of circle\n  | Rect of rect\n\n\n\nThis works, but the separation of the record type from the variant definition is\na little awkward. Beyond that, this approach imposes extra runtime costs. A\nmulti-argument variant takes up just a single heap-allocated block, while a\nsingle argument variant containing a record takes two blocks.\n\nWith 4.03, we can have the best of both worlds by definining variants containing\ninline records. Here’s what they look like.\n\ntype shape =\n  | Circle of { center_x: float; \n                center_y: float; \n                radius: float;\n              }\n  | Rect of { x_lo: float; \n              y_lo: float; \n              x_hi: float; \n              y_hi: float;\n            }\n\n\n\nAnd we can write code that uses these types as follows:\n\nlet area = function\n  | Circle c -&gt; 3.14159 *. c.radius *. c.radius\n  | Rect r -&gt; (r.x_hi -. r.x_lo) *. (r.y_hi -. r.y_lo)\n\n\n\nNote, however, that the values r and c aren’t quite first-class. For\nexample, we can’t use them away from the context of the match statement in which\nthey’re found. So this function:\n\nlet maybe_rect = function Circle _ -&gt; None | Rect r -&gt; Some r\n\n\n\nfails with the error This form is not allowed as the type of the inlined record\ncould escape.\n\nEven with this complexity, inlined records are really convenient.\n\nAnother advantage of inline records is that they allow us to express variants\nwith mutable fields. This is useful in a variety of circumstances. In Core, we\nfake this using Obj.magic for our mutable AVL tree implementation, and new we\ncan remove these hacks. Similar uses of Obj.magic were removed from OCaml’s\nown imperative queue module in this release as well.\n\nUchar and result\n\nA couple of useful types have been added to the standard library: Uchar.t and\nresult. Uchar.t represents a unicode character, and is effectively just an\nint under the covers.\n\nThe result type is a type used for signaling success or failure, and has the\nfollowing type.\n\ntype ('a,'b) result = Ok of 'a | Error of 'b\n\n\n\nBoth of these are in some sense trivial, but they’re there as a coordination\npoint between different libraries. Lots of OCaml libraries have some analogue of\nresult, and each OCaml Unicode library has its own character type. By\nincluding these in the standard library, it provides an easy point of agreement\nbetween different external libraries, while adding almost no complexity to the\ncore distribution itself.\n\nBetter unboxing for externals\n\nA small but valuable change: it’s now possible to write C bindings through which\none can pass unboxed versions of a number of different types, including floats,\nInt64’s, Int32’s and Nativeint’s. This was previously possible only for calls\nthat took only floats and returned a float. This makes it easier to write\nefficient, zero-allocation C bindings in a wider variety of cases.\n\nGC Latency improvements\n\nI talked about most of this here, so I won’t go\ninto great detail now. But the news is that the changes that Damien Doligez did\nduring his stay at Jane Street have finally made it upstream.\n\nEphemerons\n\nTo a first approximation, you can think of a GC as a machine that determines\nwhat memory can be reclaimed by figuring out what data is reachable, and then\nreclaiming everything else. The simplest notion of reachability counts\neverything reachable by following pointers, starting at the GC roots, which\nare mostly just the values on the call stack.\n\nThis notion of reachability is a bit too restrictive, however. In particular, in\nsome circumstances you want to keep pointers to objects without preventing those\nobjects from being reclaimed. This is useful for some kinds of caching, where\nyou want to cache previously computed values for as long as they’re referenced,\nbut no longer.\n\nOCaml has long had an answer to this problem, which is the notion of a weak\npointer. Weak pointers are references that aren’t counted when computing\nreachability. When an object that is pointed to by a weak reference is\ncollected, the weak reference itself is nulled out.\n\nWeak references are good enough for some purposes (e.g. hash-consing), but\nthey can be awkward for many use cases. One basic use-case for which weak\nreferences are an awkward fit is memoizing a function, where one wants to keep\nentries in the table as long as the input to the function in question is still\nreachable.\n\nYou could imagine just keeping the key of the hash-table in a weak pointer, and\nthen using a finalizer to remove the entry in the table once the key gets\ncollected. But this fails if there is a reference from the output of the\nfunction back to the key, which is common enough.\n\nEphemerons were proposed back in\n97\nby Barry Hayes to solve just this problem. The idea is pretty simple: an\nephemeron has multiple keys and a single data entry. When determining whether\nthe keys are alive, one doesn’t count references from values that are reachable\nonly via the ephemeron, so the references from the data back to the key don’t\nkeep the key alive. Also, once any of the ephemerons keys are collected, the key\nand the data element are removed immediately. Weak pointers are now just a\nspecial case of ephemerons – a weak pointer is effectively just an epehmeron\nwith no data.\n\nEphemerons don’t come up that much, but if you need to build a memoization\ntable, ephemerons make the task simpler and the result less leak-prone.\n\nLicenses, Github and OCamlbuild\n\nA few organizational changes are landing in 4.03 that I think are worth noting.\nFirst, OCaml development is officially moving from Subversion to Git, with\nGithub as the primary coordination point for OCaml development.\n\nSecond, OCaml’s license is changing from the somewhat awkward and rarely used\nQPL, to LGPLv2 with a linking exception. The latter had already been used for\nvarious libraries distributed with the compiler. But, acknowledging the fact\nthat more and more of the guts of the compiler are being used as libraries for a\nvariety of reasons, it was recently decided to move everything to LGPLv2.\n\nAnd finally, ocamlbuild is being moved out of the core OCaml distribution.\nThis is one in a series of decisions to break out software that was previously\nbundled together with OCaml. This allows the core team to focus more on the core\ncompiler itself, and makes the exported projects freer to make changes on\nwhatever schedule they see fit.\n\nI think keeping the OCaml distribution lean is an excellent strategic decision,\nallowing the core team to focus on the compiler itself, and allowing tooling and\nlibraries to be worked on independently.\n\nTogether, these are all small changes. But they’re part of a trend towards\nmaking OCaml development simpler and more agile.\n\nWhat’s missing\n\nThere are a number of much anticipated features that haven’t made it into this\nrelease. In particular, the multicore\nGC, which at one point\nhad been expected to land in 4.03, has been pushed back, likely to 4.04.\nAlgebraic\neffects, which\nare in part motivated by the multicore GC, also didn’t make it.\n\nAnd finally, modular implicits, which are\nintended to provide typeclass-like functionality for OCaml, are not in this\nrelease either. That said, they’re being actively worked on, and I’m expecting\nthere will be more news about those in the next six months.\n\nAll in, it’s been a pleasure watching this release take shape. And just as\nimportant as what got in is what didn’t. Despite a greatly increased rate of\nchange, the process is quite conservative – nonsense features just don’t make\nit in. My gratitude goes out to the core team for their safekeeping of the\nlanguage.\n",
        "url"      : "https://blog.janestreet.com/ocaml-4-03-everything-else/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "A better inliner for OCaml, and why it matters",
        "date"     : "February 24, 2016",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 10,
        "content"  : "OCaml 4.03 is branched and a first release candidate is imminent, so it seems\nlike a good time to take stock of what’s coming.\n\nThis post will focus on just one of those features: Flambda, a new IR\n(intermediate representation) in the depths of the compiler designed to allow\nfor better inlining, paired with a set of optimizations leveraging that IR.\n\nWhy inlining matters\n\nIf your expectations about inlining come from a language like C, you might not\nbe all that excited by Flambda. In C, after all, the benefits of inlining are\nrelatively small, mostly allowing one to avoid function call overhead. That’s\nuseful, but has limited impact.\n\nIn a language like OCaml, inlining is about more than just function call\noverhead. That’s because inlining grows the amount of code that the optimizer\ncan look at a given point, and that makes other optimizations more effective.\nAnd there are simply more optimizations available in a language like OCaml than\nin one like C. In particular, many allocations can be optimized away, which can\nhave a rather big effect on the performance of a program.\n\nTo get a sense of how we can eliminate allocations, consider this simple\nexample.\n\nlet rec fold l ~init ~f =\n  match l with\n  | [] -&gt; init\n  | hd :: tl -&gt; fold tl ~init:(f init hd ) ~f\n\nlet pow_sum l n = \n  fold l ~init:0 ~f:(fun x y -&gt; pow x n + pow y n)\n\n\n\nIn the current compiler, every invocation of pow_sum will allocate a closure\nfor the function f. With Flambda, this code should be allocation free, since\nthe fold can be inlined into pow_sum, f can be inlined into the body of the\nnow inlined fold, and the fold can be specialized by adding an extra argument,\nthus making the closure allocation unnecessary.\n\nHere’s another example. Imagine you wanted to sum the first and last elements of\na list. You could do this using functions in the list module and a simple\npattern match, as follows.\n\nlet first_plus_last =\n  match List.hd l, List.last l with\n  | None, _ | _, None -&gt; None\n  | Some x, Some y -&gt; Some (x + y)\n\n\n\nWithout any inlining, this function creates three heap-allocated values, which\nare the various options being returned.\n\nNow, let’s consider what happens if we rewrite this code using the option monad.\n\nlet (&gt;&gt;=) o f =\n  match o with\n  | None -&gt; None\n  | Some x -&gt; f x\n\nlet first_plus_last l =\n  List.hd l   &gt;&gt;= fun x -&gt;\n  List.last l &gt;&gt;= fun y -&gt;\n  Some (x + y)\n\n\n\nOr, equivalently, using our ppx_let syntax:\n\nlet first_plus_last l =\n  let%bind x = List.hd l in\n  let%bind y = List.last l in\n  Some (x + y)\n\n\n\nWithout any inlining, this implementation will create five heap allocated\nvalues: one for each of the closures on the right hand side of the bind, one for\nthe options returned from List.hd and List.tl respectively, and once for the\nfinal Some.\n\nIt’s of course possible to write a version that only allocates one object, for\nthe return value.\n\nlet first_plus_last l =\n  match l with\n  | [] -&gt; None\n  | x::_  -&gt;\n    let rec last_exn = function\n      | [] -&gt; assert false\n      | [y] -&gt; y\n      | _ :: tl -&gt; last_exn tl\n    in\n    Some (x + last_exn l)\n\n\n\nBut this is obviously uglier and harder to read than either of the earlier\nversions.\n\nWith Flambda (and the appropriate flags, i.e., (-O3 -unbox-closures), these\nexamples all allocate the same amount. That’s because, once the inlining is done\nand all the code is in one place, the compiler can do things like observe that\nthe option returned from List.hd is immediately deconstructed and is never\nused for anything else. As a result, the allocation of the options can be\nremoved entirely.\n\nThis example highlights why these kinds of compiler optimizations are valuable.\nIt’s not that they’re strictly necessary for good performance – usually, pretty\ngood performance can be achieved with sufficiently ugly code. But what good\noptimizations can do for us is let us write simpler, easier to understand code,\nwithout sacrificing performance.\n\nA basis for future optimizations\n\nIn addition to the optimizations that will be there when 4.03 lands, Flambda\nwill also make it easier to build new optimizations. One example comes from a\nproject that was done last summer by Will Crichton, one of our interns. The aim\nof the project was to make the kind of allocation removal I described above more\npredictable.\n\nFrom how I described it above, it might seem like Flambda’s ability to remove\nallocations relies on the ability to inline the allocating functions in their\nentirety. That would be problematic, because what is and isn’t inlined can be a\nbit unpredictable. That’s because there have to be some heuristic thresholds,\nsince inlining everything can lead to excessively large executables.\n\nAs such, depending on inlining makes it harder to predict the allocation\nbehavior of your code, since it now hinges on the heuristics for determining\nwhen a function should be inlined. The situation isn’t quite so grim – some of\nthe optimizations that come along with Flambda (like -unbox-closures) don’t\ndepend so critically on inlining. But still, many of Flambda’s optimizations do\ndepend on inlining, and as such whether they hit becomes harder to predict.\n\nBut we can make this better. The specific project the Will worked on had to do\nwith allocations that come from returning small objects that are immediately\ndeconstructed. With Flambda as it is today, such allocations are only removed if\nthe both the construction and deconstruction of the value are together, which\noften requires inlining. Will’s project made this more robust by changing the\ncalling conventions in Flambda, specifically by adding a first class\nmulti-argument return to the Flambda IR.\n\nWith a multi-argument return in place, a function that returns a small tuple,\nsay, could instead be compiled to return the components of the tuple as\nindividual values to be passed back on the stack. Then, this function can be\nwrapped with a small function that picks those values off the stack and\nallocates the necessary object.\n\nThis small wrapper function is by design small enough to be inlined reliably.\nOnce that inlining is done, both the construction and deconstruction of the\nvalue will be visible to the optimizer, and they can be collapsed. This should\nwork whether or not the main function is inlined.\n\nThis is really just one example. Our hope and expectation is that Flambda will\nbe the basis of many improvements in the compilation of OCaml over the next few\nyears.\n\nImpact\n\nThe performance impact we’ve seen from our experiments with Flambda seem pretty\npromising so far. On real world applications we’ve tested, it’s pretty normal to\nsee allocations reduced by 20-30%. We’ve found similarly sized improvements in\napplication latency as well. And we think that these numbers will improve yet\nmore as more optimizations are added on top of Flambda.\n\nBeyond improving existing programs, we already are seeing how Flambda allows us\nto write prettier and cleaner code without compromising on performance. For\nexample, Flambda allows us to make freer use of OCaml’s functors, which are a\ngreat abstraction tool, but one that imposed significant costs pre-Flambda. With\nFlambda, functors can simply be inlined away.\n\nFlambda makes things easier in other ways too. For example, we’re in the\nprocessing of developing a new internal protocol for zero-allocation messaging,\nwhich requires a lot of code generation, via PPX. Being able to rely on Flambda\ngreatly simplifies that code generation, since it lets us write a fairly naive\ncode generator, which generates efficient code because of Flambda. We can write\ncode that computes offsets into the message naively, and Flambda, via a\ncombination of inlining and constant folding, moves the computation of these\noffsets to compile time, so the runtime field access becomes a simple lookup.\n\nEven if you’re perfectly happy with OCaml’s performance as is, Flambda is still\nan exciting change. That’s because various upcoming language improvements like\nmodular implicits (a feature that brings some of the same benefits as\nHaskell’s typeclasses) will only really perform acceptably well with a good\ninliner in place. So in addition to making your current programs run faster,\nFlambda also enables new abstractions that can make your future programs easier\nto read and write.\n\nOne downside of all this is that it reduces the predictability of OCaml’s\nperformance, which is an important property of the language. Part of how we hope\nto address this is by improving Flambda’s heuristics to be more predictable, but\nthat’s unlikely to be enough on its own. That’s why OCaml 4.03 comes with new\nannotations that let the programmer require or prohibit inlining for a\nparticular function.\n\nOver time, we hope to find a good balance, where the heuristics do a pretty good\nand pretty predictable job by default, but where we give programmers hooks to\ncontrol things more precisely where it matters.\n\nThanks\n\nGetting Flambda this far has been a big project, and a lot of people were\ninvolved. OCamlPro did a lot of the heavy lifting, with Pierre Chambart writing\nmost of the compiler code, and Louis Gesbert and Michel Laporte working on\nbenchmarking. Jane Street jumped in too, with Mark Shinwell and Leo White\ncontributing mightily to the code and design as part of the review and\nstabilization process, and Jeremie Dimino helping with testing. And a number of\nother people on the core team (notably Alain Frisch, Gabriel Scherer, Runhang\nLi, Damien Doligez) contributed in a variety of ways to the final product.\nThanks to all involved!\n",
        "url"      : "https://blog.janestreet.com/flambda/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Self Adjusting DOM and Diffable Data",
        "date"     : "February 10, 2016",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 17,
        "content"  : "In my last post, I gave some simple examples showing how\nyou could use\nself adjusting computations,\nor SAC, as embodied by our Incremental library, to\nincrementalize the computation of virtual dom nodes. In this post, I’d like to\ndiscuss how we can extend this approach to more realistic scales, and some of\nthe extensions to Incremental itself that are required to get there.\n\nAlong the way, we’ll discuss the way that diffable data structures relate to\nself adjusting computations, and how we can use ordinary pure data types and\nfunctional updates to those data types as the basis for building efficient\nincremental computations.\n\nFirst, let’s bring back the code of the example from that post, which is a\nsimple component that displayes the state of a machine in a datacenter.\n\nmodule Machine = struct\n  module Model = struct\n    type display_mode = Temperature | Uptime\n    type t = { temperature: float;\n               uptime: Time.Span.t;\n               selected: bool;\n               mode: display_mode;\n             }\n      [@@deriving fields]\n  end\n\n  let view model =\n    let mode        = Incr.map model ~f:Model.mode        in\n    let temperature = Incr.map model ~f:Model.temperature in\n    let selected    = Incr.map model ~f:Model.selected    in\n    let uptime      = Incr.map model ~f:Model.uptime      in\n    let text =\n      match%bind mode with\n      | Temperature -&gt;\n        let%map x = temperature in sprintf \"%F degrees\" x\n      | Uptime -&gt;\n        let%map x = uptime in Time.Span.to_string x\n    in\n    let%map text = text and selected = selected in\n    Vdom.text\n      (if selected then [Vdom.class_ \"selected\"] else [])\n      text\nend\n\n\n\nIt’s worth mentioning that in the service of exercising the machinery, I’ve\nover-incrementalized this example. It would very likely be cheaper to just\ngenerate the necessary vdom in its entirety every time the model changes. That\nsaid, the above is still a good example of how to use Incremental, and it’s\nworth understanding how it works.\n\nThe first thing that is done by view is to project out different parts of the\nmodel into separate incrementals. The key observation is that, even though\nmodel changes, if Model.mode model doesn’t change, then the corresponding\nincremental will cut off computation there, preventing dependent computations\nfrom refiring. So, whenever the model changes we’ll check each field in the\nmodel and whether it changed before launching dependent computations.\n\nBy default, this cutoff happens based on physical equality; but one can also\nspecify more expansive semantic notions of equality when that’s important.\n\nMapping over maps, incrementally\n\nThe projection approach works well for simple fixed-size records, but it doesn’t\nwork when it comes to more complex models with variable sized components. To see\nwhy, let’s consider what happens if we add one more layer, an outer model\nconsisting of a collection of named machines. We do this using OCaml’s Map\ndata structure, which is essentially a functional dictionary.\n\nmodule Model = struct\n  type t = Machine.Model.t String.Map.t\nend\n\n\n\nHow do we construct an incremental view for this? We can no longer simply\nproject out all of the individual machine models into a fixed set of\nincrementals, since the number of machines in the map is not known in advance.\n\nWhat we need here is a way of efficiently mapping over an incremental map. (The\nnaming collision between the Incr.map function and the Map.t data structure\nis an unforutnate source of confusion here.) Here’s a signature that captures\nwhat we need:\n\nval Incr.Map.map\n  :  ('k,'v,'cmp) Map.t Incr.t\n  -&gt; f:('v Incr.t -&gt; 'w Incr.t)\n  -&gt; ('k,'w,'cmp) Map.t Incr.t\n\n\n\nIf you’re not used to reading OCaml signatures, this can take a moment to\ndigest. First, note that ('k,'v,'cmp) Map.t denotes a map whose key is of type\n'k, whose data is of type 'v. (You can ignore the 'cmp parameter.)\n\nSo, what does this function do? A good rule of thumb for incremental is that if\nyou want to understand the meaning of an incremental function (ignoring\nperformance), you can just drop all the incremental bits. If we do that for the\nabove type signature, we get this.\n\nval Map.map\n  :  ('k,'v,'cmp) Map.t\n  -&gt; f:('v -&gt; 'w)\n  -&gt; ('k,'w,'cmp) Map.t\n\n\n\nThis type correctly suggests that the function is, as the name suggests, a map,\nwhich is to say, a function that transforms the elements of a container, thereby\nproducing a new instance of that container.\n\nThat tells us what Incr.Map.map is supposed to compute, but what about the\nperformance of that computation? In particular, what is the structure of the\nincremental graph this generates?\n\nThe following picture illustrates the desired incremental graph. The idea is\nthat there is a path in the graph for every key/data pair in the map. When a\ngiven key is updated, an update should be passed down the corresponding\nincremental path. When a key is added or removed, then paths should be added or\nremoved accordingly. And, rather than recomputing map' from scratch on every\nupdate, we want to do the corresponding update, insert or delete into map'.\n\n\n\nThis function is enough to build an efficient version of our view. In\nparticular, we can write our view function as follows:\n\nlet view (model : Model.t Incr.t) : Vdom.t Incr.t =\n  let%map machine_vdoms = Incr.Map.map model ~f:Machine.view in\n  Vdom.node \"div\" [] (Map.data machine_vdoms)\n\n\n\nNote that at the end, we do need to construct a complete list of all the\ndisplayed machines, and that does have to be reconstructed in its entirety on\nevery change. That’s because the virtual dom API uses lists, which can’t be\nupdated incrementally in an efficient way. If the virtual dom were modified to\ntake a tree-like structure for nodes, this could be handled efficiently.\n\nThat said, I don’t think this is a serious practical problem, since in most UIs,\nthe cost of allocating such nodes is a small part of the overall cost of\nreconstructing the virtual dom.\n\nDiffable data and incrementality\n\nIn order to implement Incr.Map.map efficiently, we need some special support\nboth from our Map datatype and from incremental itself. From Map, we need a\nway to efficiently figure out a set of updates, inserts and deletes that will\ntake us from one map to another. Happily, the Map module in Core_kernel\ncontains a function that can help.\n\nval Map.symmetric_diff\n  :  ('k, 'v, 'cmp) Map.t\n  -&gt; ('k, 'v, 'cmp) Map.t\n  -&gt; data_equal:('v -&gt; 'v -&gt; bool)\n  -&gt; ('k * [ `Left of 'v | `Right of 'v | `Unequal of 'v * 'v ]) Sequence.t\n\n\n\nHere, the Left case corresponds to data that’s only in the first map, but not\nin the second, and so is a deletion. Similarly, the Right case corresponds to\nan addition, and the Unequal case corresponds to an update to the value.\n\nA key property of Map.symmetric_diff is that it takes advantage of physical\nequality of nodes by short-circuiting. This means that if you take a map, make\nsome modifications to it, and then compute the symmetric diff between the\noriginal map and the new one you just created, the computational effort is\nproportional to the work that was required to generate the second map in the\nfirst place. In other words, you can efficiently read back the set of changes\napplied to a map.\n\nWith Map.symmetric_diff, we have a way to efficiently determine what has\nchanged in the map, and thereby which incrementals need to be updated, added or\nremoved. This diffing is reminiscent of both the diffing inherent in the virtual\ndom approach, and the enter/update/exit\npattern that characterizes the D3 visualization library.\n\nImplementing Incr.Map.map\n\nMap.symmetric_diff isn’t quite enough to build Incr.Map.map. We also need\nsome extra support from Incremental itself for controlling the pattern of\nupdates so that even though the node containing the input map has many dependent\nnodes, we only fire the nodes that are impliciated by the symmetric diff.\nIndeed, we’re still working on providing first class support for this kind of\nprimitive within Incremental.\n\nEven without those hooks, we can still take some steps in the right direction,\nthe first of which is to use mutable state to keep the old map to diff against.\nEssentially, we want a function with this signature:\n\nval Incr.diff_map :\n  'a Incr.t -&gt; f:(old:('a * 'b) option -&gt; 'a -&gt; 'b) -&gt; 'b Incr.t\n\n\n\nHere, the function f gets access to the old input and old output of this same\nfunction when computing the new one. We can implement diff_map\nstraightforwardly enough:\n\nlet diff_map i ~f =\n  let old = ref None in\n  let%map a = i in\n  let b = f ~old:!old a in\n  old := Some (a,b);\n  b\n\n\n\nUsing diff_map, we can implement a simpler form of Incr.Map.map which takes\na function that transforms the elements of the map directly, rather than\nincrementally. It would have this signature:\n\nval Incr.Map.simple_map\n  :  ('a, 'b, 'c) Map.t Incr.t\n  -&gt; f:('b -&gt; 'd)\n  -&gt; ('a, 'd, 'c) Map.t Incr.t\n\n\n\nAnd here’s the implementation.\n\nlet simple_map m ~f =\n  diff_map m ~f:(fun ~old m -&gt;\n      match old with\n      | None -&gt; Map.map m ~f\n      | Some (old_in,old_out) -&gt;\n        let diff = Map.symmetric_diff ~data_equal:phys_equal old_in m in\n        Sequence.fold diff ~init:old_out ~f:(fun acc (key,change) -&gt;\n            match change with\n            | `Left _ -&gt; Map.remove acc key\n            | `Right d | `Unequal (_,d) -&gt; Map.add acc ~key ~data:(f d)\n          )\n    )\n\n\n\nThe implementation is simple enough: when the map changes, we diff the old input\nand the new input. By folding over that diff, we can efficiently update the old\noutput to produce the new output.\n\nImplementing the full Incr.Map.map is more complicated, because the fact that\nf is a function that consumes and produces incrementals means that you need to\nconstruct a more complicated incremental graph – effectively, you need to\nsplit, transform and merge the results.\n\nWe can bridge the difference between Incr.Map.simple_map and Incr.Map.map\nwith the following two function.\n\nval Incr.Map.split :\n  ('a,'b,'cmp) Map.t Incr.t -&gt; ('a,'b Incr.t,'cmp) Map.t Incr.t\nval Incr.Map.join :\n  ('a,'b Incr.t,'cmp) Map.t Incr.t -&gt; ('a,'b,'cmp) Map.t Incr.t\n\n\n\nIncr.Map.split takes an incremental map and produces an incremental map whose\ndata is itself incremental. The idea is that the outer incremental updates only\nwhen a key is added or removed from the input map. Changes to the input map that\nchange data of an existing key instead lead to an update to the corresponding\ninner incremental. Incr.Map.join is simply the inverse operation, taking an\nincremental map full of incrementals, and producing a simple incremental map.\n\nUsing these two functions together, we can convert our simple_map into a full\nimplementation of Incr.Map.map, as follows.\n\nlet map m ~f =\n  Incr.Map.join (simple_map ~f (Incr.Map.split m))\n\n\n\nWhile there are hacks to approximate them using Incr.map and Incr.bind, the\nordinary incremental interface isn’t enough to build a version of\nIncr.Map.join and Incr.Map.split with the right performance characteristics.\nFor that reason, we’ve started work on an Expert interface within Incremental\nthat lets you create incremental nodes with precise control over their\ndependencies and when they refire.\n\nWhy virtual-dom?\n\nAlmost everyone that I’ve discussed these ideas with, both inside of Jane Street\nand beyond, have asked the question: given that you have Incremental, why bother\nwith using a virtual dom at all? After all, Incremental provides a mechanism for\ntracking which parts of the computation have changed. WHy should one have two\nlayers that are doing something so similar?\n\nWe’ve made things even more duplicative by diffing not just the virtual dom but\nalso our own internal data structures as a way of creating more efficient\nincremental computations on top of ordinary functional data types.\n\nBut despite the seeming redundancy of this approach, I think it has real value.\nThat’s because, although the optimizations provided by Incremental are valuable,\nprogramming with Incremental is a worse experience than doing ordinary\nfunctional programming with immutable values. The big win of virtual dom is that\nit lets you do most of your programming in a typical functional style with\nreasonable performance. You only need to resort to using Incremental when the\nperformance provided by this simple approach isn’t sufficient.\n\nBut diffability isn’t just useful on the output of your incremental computation.\nIt’s also helpful on the inputs, and that’s where the diffability of\nCore_kernel’s maps comes in handy. That way, we can write our code for\nupdating our model in an ordinary functional style, but still use incremental to\nefficiently scaffold our view functions on top.\n\nSumming up\n\nWe have some beginning versions of all of this infrastructure internally at this\npoint. In particular, we’ve built fast versions of Incr.Map.map and friends,\nand used this for creating some simple dynamic web applications. The results so\nfar are pretty promising. By paying some careful attention to how the input data\nchanges and incrementalizing accordingly, we can build applications that display\nthousands of rows of data of which hundreds of cells are changing every second\nwith relatively low latency, able to respond to every update in just a handful\nof milliseconds.\n\nI doubt our performance is as good as more widely used and likely better\noptimized frameworks like React and Mercury. But the thing that has struck me\nabout all of this work is how easy and natural it has been. With only a small\namount of effort, we’ve been able to leverage existing tools, from Incremental\nto Async to virtual-dom, to write highly responsive applications in a simple and\nnatural style, and one that fits naturally within our existing OCaml libraries\nand tools.\n\nAnother interesting question is how this all relates to the use of FRP for UIs.\nI’ve written a bit about the difference between FRP and\nSAC, but the short version is that FRP is more about modeling temporal\nprocesses, and SAC is purely about optimization.\n\nFor my perspective, SAC fits more cleanly into the goal of simply rendering\nvirtual-dom, and I can’t see as much value in having time-oriented operations as\na first-class construct. And the fact that SAC is focused purely on optimization\nallows it to do a better job of that. In particular, the dynamism of operations\nlike bind makes it possible to precisely control the dependency structure of the\nsystem, and do things like quiesce parts of the UI that are not in view. This is\nharder to do in an FRP system without introducing memory leaks.\n\nAll of which makes me think that self adjusting computation is a useful idiom\nfor UI programming, and perhaps one that should be adopted more widely.\n",
        "url"      : "https://blog.janestreet.com/self-adjusting-dom-and-diffable-data/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Self Adjusting DOM",
        "date"     : "February 6, 2016",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 16,
        "content"  : "I’ve been thinking recently about how to\nstructure dynamic web applications, and in particular about the role that\nincremental computation should play.\n\nIn this post, I’d like to outline an approach we’ve been experimenting with\ninternally which uses Incremental, a general\npurpose library for building\nso-called\nself adjusting computations.\nSelf adjusting computations are essentially graph-structured computations that\ncan be updated efficiently when their inputs change.\n\nI’ll describe this all in terms of OCaml, which is the language we’re doing\nthese experiments in (courtesy of the excellent js_of_ocaml), but the\nideas should be applicable to other languages.\n\nElm in OCaml\n\nLet’s start by outlining a simple, non-incremental approach, inspired by\nthe Elm Architecture. In that\narchitecture, there are three primary elements from which an application is\ncomposed: the model, the view and a set of actions.\n\nThe model represents the logical state of the application. This includes\nthings like the data being displayed on the page, the current page being\ndisplayed, and the part of the page that’s in focus. But it omits most\npresentation details, and doesn’t specify the concrete choice of dom elements\nused.\n\nActions are the things that can happen to the model. The arrival of new data\nto be displayed, or the click of a mouse, are translated into actions which are\nin turn interpreted to update the state. These are important, but don’t affect\nthe incrementality story much, so we won’t discuss them in detail.\n\nThe view is a function that takes the current model and produces a vdom\n(virtual dom) tree, an immutable datastructure that represents the intended\nstate of the dom. It is the view function that makes the concrete presentation\nchoices that are left unsaid by the model. The view determines some of the\ndynamic behavior of the application as well, since within the vdom you can\nregister callbacks on keypresses or mouse clicks that enqueue actions to later\nbe applied to the state.\n\nWe can wrap this up into a module signature, as follows.\n\nmodule type App_intf = sig\n  module Model : sig\n    type t\n  end\n\n  module Action : sig\n    type t\n    val apply : t -&gt; Model.t -&gt; Model.t\n  end\n\n  val view : Model.t -&gt; (Action.t -&gt; unit) -&gt; Vdom.t\nend\n\n\n\nNote that the second argument to view is a function that can be used for\nscheduling actions to occur. This function is used to build callbacks which are\nattached to the vdom.\n\nWe can combine this with a simple interface for starting up an application:\n\nval start_app\n  : (module App_intf with type Model.t = 'model)\n  -&gt; init:'model\n  -&gt; unit\n\n\n\nwhich is responsible for running the display loop.\n\nThis isn’t quite enough; in particular, I’ve omitted the necessary hooks for\nkicking off asynchronous processes for doing things like grabbing data from a\nserver. But the model above is a pretty good start, and gives you a sense of the\nstructure of an Elm-style application.\n\nThis approach isn’t too bad from a performance perspective. In particular, the\non_startup function minimizes the amount of churn to the dom by, on every\naction, diffing the newly generated vdom against the previous instantiation, to\nproduce a minimal patch to apply to the dom proper. And modifications to the dom\nitself are quite expensive.\n\nBut this approach doesn’t scale to big, complex UIs. That’s because, even though\nmost actions change the model in only small ways, the full vdom tree has to be\ncreated every time. If the vdom is large and complicated, this is a serious\nproblem.\n\nIncremental Basics\n\nLet’s see how we can use Incremental to optimize generation of the vdom.\n\nImagine we want to display information about the state of a set of machines in\nour datacenter. For each machine, assume we have two pieces of data: the\ntemperature, and whether that particular server is currently selected. Now,\ngiven incremental values representing the data for just one machine, how can we\nuse Incremental to produce the vdom?\n\nIncremental provides operators for building computations on top of incremental\nvalues. For example, the function Incr.map2 has this signature.\n\nval Incr.map2 : 'a Incr.t -&gt; 'b Incr.t -&gt; f:('a -&gt; 'b -&gt; 'c) -&gt; 'c Incr.t\n\n\n\nThis lets us build a computation that takes two incremental inputs, and combines\nthem to make a single new incremental. We can use this for generating an\nincremental vdom to display our machine.\n\nlet view temperature selected = \n    Incr.map2 temperature selected\n      ~f:(fun temperature selected -&gt;\n          Vdom.text\n            (if selected then [Vdom.class_ \"selected\"] else [])\n            (sprintf \"%F degrees\" temperature))\n\n\n\nWe can write this a bit more naturally using the\nppx_let syntax extension.\nEssentially, ppx_let allows us to encode maps using ordinary let binding\nsyntax.\n\nlet view temperature selected =\n  let%map temperature = temperature and selected = selected in\n  Vdom.text \n    (if selected then [Vdom.class_ \"selected\"] else [])\n    (sprintf \"%F degrees\" temperature)\n\n\n\nThe key issue here is that the code regenerating the text node will only be\nrerun when necessary, i.e., when the value of either selected or\ntemperature have changes. In a complex view with lots of incremental inputs\nand lots of vdom nodes built on top of them, judicious use of map allow you to\nrecompute only the vdom that are in need of an update.\n\nUsing bind\n\nIt turns out that map isn’t enough. One limitation of map is that the\ndependencies introduced by map are static. e.g. Incr.map2 a b ~f will\nproduce a node that reruns f every time a or b change.\n\nBut sometimes, you want dynamic rather than static dependencies. For a trivial\nexample, imagine that our machine view has different inputs that might be used\nin different situations, say, there’s a mode that determines whether uptime or\ntemperature are displayed. We could implement such a view on top of map as\nfollows.\n\nlet view temperature uptime selected mode =\n  let%map temperature = temperature\n  and     uptime      = uptime\n  and     selected    = selected\n  and     mode        = mode\n  in\n  Vdom.text\n    (if selected then [Vdom.class_ \"selected\"] else [])\n    (match mode with\n     | Temperature -&gt; sprintf \"%F degrees\" temperature\n     | Uptime -&gt; Time.Span.to_string uptime)\n\n\n\nHere, the appearance of the dom node is dynamic, but the dependencies are not.\nEven if you’re in Temperature node, you’ll still recompute the Vdom when the\nuptime changes.\n\nOn this small scale, the extra cost is trivial. But as you consider larger more\ncomplicated views, the ability to control dependencies precisely can have real\nvalue.\n\nIn this case, we can control the dependencies using bind. Here’s the signature\nof bind:\n\nval bind : 'a Incr.t -&gt; ('a -&gt; 'b Incr.t) -&gt; 'b Incr.t\n\n\n\nThe signature is deceptively simple, but it lets you do something powerful. In\nparticular, Incr.bind i f returns an incremental whose dependencies, and even\nwhose interior nodes, are chosen dynamically by f.\n\nHere’s a simple rewrite of the code above that takes advantage of bind.\n\nlet view temperature uptime selected mode =\n    let text =\n      Incr.bind mode (function\n          | Temperature -&gt; \n            let%map x = temperature in sprintf \"%F degrees\" x\n          | Uptime -&gt; \n            let%map x = uptime in Time.Span.to_string x\n        )\n    in\n    let%map text = text and selected = selected in\n    Vdom.text\n      (if selected then [Vdom.class_ \"selected\"] else [])\n      text\n\n\n\nHere, bind lets us create a text incremental that depends either on the\ntemperature or on the humidity, depending on the mode. We can write this with\nour syntax extension, using its specialized match syntax.\n\nlet view temperature uptime selected mode =\n    let text =\n      match%bind mode with\n      | Temperature -&gt; \n        let%map x = temperature in sprintf \"%F degrees\" x\n      | Uptime -&gt; \n        let%map x = uptime in Time.Span.to_string x\n    in\n    let%map text = text and selected = selected in\n    Vdom.text\n      (if selected then [Vdom.class_ \"selected\"] else [])\n      text\n\n\n\nOne thing that’s nice about ppx_let is that it makes it easier to separate\nthinking about what your code does from how it’s incrementalized. If you ignore\nthe %map and %bind annotations, what you’re left with is enough to\nunderstand the meaning of the computation that’s being incrementalized. The\nannotations are only important for understanding the performance characteristics\nof the incremental recomputation.\n\nDecomposing incrementals\n\nHere’s an updated version of our App_intf which includes Incremental.\n\nmodule type App_intf = sig\n  module Model : sig\n    type t\n  end\n\n  module Action : sig\n    type t\n    val apply : t -&gt; Model.t -&gt; Model.t\n  end\n\n  val view : Model.t Incr.t -&gt; (Action.t -&gt; unit) -&gt; Vdom.t Incr.t\nend\n\n\n\nThe only change is the view function, which instead of taking a Model.t and\nreturning Vdom.t now takes a Model.t Incr.t and returns a Vdom.t Incr.t.\nAnd instead of calling this function on every update, we simply call it once at\nthe beginning. The start_app function would be responsible for repeatedly\nupdating the Model.t Incr.t as the model changes, and can then read off the\nnew vdom node from the Vdom.t Incr.t.\n\nThis all sounds good on the surface, but there’s a fly in this ointment, which\nis that my earlier examples were built on the assumption that the different\ninputs to the computation were already broken down into separate incremental\nvalues. But here, we have one big incremental value containing the entire model.\nHow do we apply Incremental effectively in this case?\n\nLet’s revist our server-display example from before. Now, instead of assuming we\nhave a different incremental for each property of a server, imagine we have one\nincremental representing the full state of one server. We can use the following\nas our model type:\n\nmodule Model = struct\n      type display_mode = Temperature | Uptime\n      type t = { temperature: float;\n                 uptime: Time.Span.t;\n                 selected: bool;\n                 mode: display_mode;\n               }\n      [@@deriving fields]\n    end\n\n\n\nThe deriving annotation above provides us with accessor functions for each\nfield, which will be useful shortly.\n\nIt’s easy enough to write a view function that returns an incremental that\nrecomputes in its entirety every time the model changes, as shown below.\n\nlet view model =\n  let%map model = model in\n  let text =\n    match model.mode with\n    | Temperature -&gt; \n      sprintf \"%F degrees\" model.temperature\n    | Uptime -&gt; \n      Time.Span.to_string model.uptime\n  in\n  Vdom.text\n    (if model.selected then [Vdom.class_ \"selected\"] else [])\n    text\n\n\n\nAnd for this tiny example, that’s likely the correct thing to do. But we’ll want\nto incrementalize more precisely for real world examples, so let’s see how we\ncould do that in this case.\n\nWhat we effectively need to do to incrementalize this is to convert our one big\nincremental model into a number of smaller incrementals. We can do that by\nprojecting out individual components of the model using map.\n\nFor example, if we write:\n\nIncr.map model ~f:(fun m -&gt; m.mode)\n\n\n\nor, equivalently\n\nIncr.map model ~f:Model.mode\n\n\n\nWe’ll get an incremental that contains only the mode. Critically, incrementals\nthat are depend on this one will only update when the mode actually changes,\nnot, say, the temperature is updated. That’s because Incremental cuts off\ncomputations when the new output is physically equal to the old one, so that\neven if the model changes, each projected incremental will only propagate the\ncomputation if its data has changed.\n\nWe can use this approach to first project out the fields of the model record\ninto different incrementals, and then build out computation on top of that. That\nlooks like this:\n\nlet view model =\n      let mode        = Incr.map model ~f:Model.mode        in\n      let temperature = Incr.map model ~f:Model.temperature in\n      let selected    = Incr.map model ~f:Model.selected    in\n      let uptime      = Incr.map model ~f:Model.uptime      in\n      let text =\n        match%bind mode with\n        | Temperature -&gt; \n          let%map x = temperature in sprintf \"%F degrees\" x\n        | Uptime -&gt; \n          let%map x = uptime in Time.Span.to_string x\n      in\n      let%map text = text and selected = selected in\n      Vdom.text\n        (if selected then [Vdom.class_ \"selected\"] else [])\n        text\n\n\n\nAnd this has basically the right incremental structure.\n\nThis is a good start, but we still don’t really have the full story. In\nparticular, the above approach to incrementalization makes sense when our\noverall data is organized as simple static structures like records. But it’s not\nclear what to do when we have more complex data structures. For example, what if\nwe had a collection of machines stored in a map or a set? We don’t yet have a\nway of efficiently handling this kind of complex abstract data type using\nIncremental.\n\nMore on that in my next post.\n",
        "url"      : "https://blog.janestreet.com/self-adjusting-dom/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Incremental computation and the web",
        "date"     : "January 30, 2016",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 11,
        "content"  : "I’ve recently been thinking about the world of JavaScript and web applications.\nThat’s odd for me, since I know almost nothing about the web. Indeed, Jane\nStreet’s use of web technologies is quite minimal – nearly all of our user\ninterfaces are text based, and all told we’ve been pretty happy with that.\n\nBut there are real limitations to console apps, and if you need something richer\nthat’s cross-platform, the web is pretty appealing. For us it’s made yet more\nappealing by the fact that OCaml, our language of choice, compiles into\nJavaScript via js_of_ocaml.\n\nSo recently, when a few people internally got interested in trying out\nJavaScript-based UIs, I dug in a little to try to understand the landscape, and\nhelp us figure out what approach to take.\n\nVirtual DOM, all at once\n\nI started by trying to understand more about the approaches taken in the wider\nworld for building JavaScript apps. One idea the struck me as particularly\ninteresting was virtual DOM. Virtual DOM showed up first in Facebook’s\nReact, but has since inspired other\nimplementations, like Matt Esch’s\nvirtual-dom library, which in turn\nis the basis of the Mercury web framework\nand Elm’s Html\nlibrary. Other libraries have wrapped and extended React, like ClojureScript’s\nOm framework.\n\nTo understand the appeal of virtual DOM, you first need to understand what the\nworld looks like without it. JavaScript applications in the browser are\nfundamentally tied to the DOM, which is the tree of objects that reflects the\nstructure of the HTML that the page is built from. The DOM is wired into the\nbrowser, and mutating the DOM is how you change what’s shown on the screen, a\nnecessity for a dynamic web app.\n\nBut working directly with DOM can be awkward. For one thing, it encourages you\nto write your display logic twice: once to create the initial state of your\npage, and then again for the code that updates the DOM in response to external\nevents.\n\nBut it’s worse than just having to write your logic twice: the second time is\nalso trickier. That’s because, for performance reasons, you need to minimize\nchanges to the DOM, since those changes can cause expensive reflows and the\nlike. Doing this kind of change minimization by hand can be pretty painful.\n\nThe goal of virtual DOM is to let you write your display logic just once, and to\ndo so in a convenient, straight-ahead style. The idea is simple: instead of\nmodifying the DOM directly, you create immutable trees that represent the DOM\nyou’d like to have, a virtual DOM. Every time your state changes, you\nrecompute the virtual DOM, and then diff the new virtual DOM against the old.\nThe resulting patch can then be applied to the actual DOM, which minimizes\nchanges to the DOM proper.\n\nThis makes it easier to express your display logic cleanly. Elm uses virtual DOM\nas part of the Elm\narchitecture. In that\napproach, the state of the application is kept in a model value, which\nabstractly represents the state of the application, omitting presentation\ndetails. A separate view function is used to convert the model into a virtual\nDOM tree that describes the page that should be shown to the user.\n\nThe application is made dynamic by adding an action type which summarizes the\nkinds of changes that can be made to the model. Actions are enqueued either from\ncallbacks embedded in the virtual DOM, or by other jobs that communicate with\nthe outside world via web requests and the like. When the action is applied to\nthe model, a DOM update is done by recomputing the virtual DOM, and then diffing\nand patching the real DOM accordingly.\n\nVirtual DOM, incrementally\n\nAs described, the above approach involves computing the virtual DOM from scratch\nafter every action. This is done even if the change to the DOM implied by the\naction is small, which is the common case. Essentially, every key press and\nmouse click causes the entire virtual DOM to be recomputed.\n\nIn a world where DOM updates are the only expense that matters, this isn’t so\nbad. And for sufficiently small web applications, that’s almost right. But once\nyou’re creating large, dynamic UIs, this simple story falls apart, and the cost\nof recreating the virtual DOM every time matters.\n\nThat’s why in all of these virtual DOM APIs and frameworks, there’s some form of\nincrementalization built in, a way to avoid paying the full cost of rebuilding\nthe virtual DOM when the logical changes are small.\n\nIn React, for example, the state is organized into a set of hierarchical\ncomponents, each with its own render function. These components are structured\nto match the structure of the HTML that they generate, with the idea that you’ll\nonly have to re-render the few components whose input data has actually changed.\nReact effectively memoizes the render computation at the component level.\n\nElm, rather than tying the incrementalization directly to a framework-level\nnotion of component, lets you introduce memoization in the construction of\nindividual virtual DOM nodes. To do this, Elm’s Html module exposes a set of\n“lazy” functions with roughly these signatures (shown with OCaml syntax).\n\nval lazy1 : ('a -&gt; Html.t) -&gt; 'a -&gt; Html.t\nval lazy2 : ('a -&gt; 'b -&gt; Html.t) -&gt; 'a -&gt; 'b -&gt; Html.t\nval lazy3 : ('a -&gt; 'b -&gt; 'c -&gt; Html.t) -&gt; 'a -&gt; 'b -&gt; 'c -&gt; Html.t\n\n\n\nHere, the first argument is the render function, and the remaining arguments are\nthe values to be passed to the render function.\n\nThe idea is that a call to one of these lazy functions won’t call the render\nfunction immediately. Instead, it creates a special node that stores the render\nfunction and its arguments for later. The render function is only called as part\nof the process of diffing two virtual DOM trees. When the diff gets to the point\nof comparing two such nodes, it first compares the things that the node was\nbuilt from, i.e., the render function and its arguments. If they’re the same,\nthen the diff is empty. If they differ, then the render function is run to\ncompute more of the tree, and the diffing process continues from there.\n\nIt’s worth noting that forcing the render function for a given node to run will\ncreate more of the virtual DOM tree, but it won’t necessarily build everything\nbelow that node. In particular, the tree that’s created may contain yet more\nlazy nodes, which won’t be forced until the diffing process gets to them.\n\nBy making enough nodes lazy in this way, you can incrementalize the computation\nof the entire virtual dom tree, only forcing the recomputation of parts of the\nvirtual dom that could have changed given the changes in the underlying data\nmodel.\n\nElm’s approach has some limitations. While it doesn’t limit memoization to a\nparticular notion of a component, it does tie it to nodes in the DOM tree. This\ncan be limiting, since it prevents you from sharing other parts of the\ncomputation that don’t result concretely in DOM nodes.\n\nIt’s also a little anti-modular, in that you basically need to call your lazy\nfunction on simple values and top-level functions, so ordinary functional\nprogramming modularization techniques, which often rely on passing around\nclosures, don’t work as well as you’d hope.\n\nBeyond virtual DOM\n\nVirtual DOM isn’t the only approach to simplifying the process of programming\nthe DOM. Another example I ran across is Mike Bostock’s amazing D3\nlibrary. D3 has some of the same goals as virtual DOM, in\nthat it aims to provide a nice way to construct complex web pages based on some\nmore abstract data model. Like virtual DOM, D3’s approach lets you specify the\nview logic once, while producing a view that responds efficiently to changing\ndata. D3 is doing this in the service of data visualization, but the approach it\ntakes is not limited to that domain.\n\nWhere virtual DOM encourages you to think of your view calculation as an all at\nonce affair, D3 makes you think about incrementalization explicitly where it\nmatters. In particular, when you specify how the view changes in response to\ndata, you do so by explicitly specifying what happens in three cases: enter,\nupdate, and exit. The enter case\ncorresponds to new data points arriving, update corresponds to data points\nthat are changing, and exit corresponds to data being removed.\n\nThese transformations are specified using a spiffed up version of the DOM\nselectors API, which lets you can select a collection of nodes by stating\nconditions that those nodes satisfy. You can then specify ways of transforming\nthose nodes, and, somewhat surprisingly, specify the creation of nodes that\ndon’t exist yet. This is done using the append operation, and is all part of\nwhat’s is called data binding in the D3 world.\n\nIf this sounds confusing, well, I found it confusing too. But the D3 approach\nhas some good things going for it. For one thing, it gives you a natural way of\nthinking about animations, since you can specify simple animations to run on the\nenter/exit/update actions, which is more awkward in virtual DOM based\napproaches.\n\nTo borrow an analogy from circuits, virtual DOM is level-triggered, meaning\nthe view depends only on the current value of the state; but D3 is\nedge-triggered, meaning that the display logic can depend on the precise\ntransition that’s occurring. This is a real difference in the models, but I’m\nnot sure how important it is in practice.\n\nTo some degree, you can get around this issue on the virtual DOM side by\nexpressing more time-dependent information in the model. Also, you can add\nedge-triggered events on top of your virtual DOM, which React does. That said,\nit’s not as front-and-center in the Virtual DOM API as it is with D3, where\nedge-triggered animations are an easy and natural part of the design.\n\nIncrementality everywhere\n\nGiven that incrementality seems to show up in one form or another in all of\nthese web frameworks, it’s rather striking how rarely it’s talked about.\nCertainly, when discussing virtual DOM, people tend to focus on the simplicity\nof just blindly generating your virtual DOM and letting the diff algorithm sort\nout the problems. The subtleties of incrementalization are left as a footnote.\n\nThat’s understandable, since for many applications you can get away without\nworrying about incrementalizing the computation of the virtual DOM. But it’s\nworth paying attention to nonetheless, since more complex UIs need\nincrementalization, and the incrementalization strategy affects the design of a\nUI framework quite deeply.\n\nThe other benefit of thinking about incrementalization as a first class part of\nthe design is it can lead you in new directions. In that vein, I’ve been\nexperimenting with using self-adjusting computations, as embodied by our\nIncremental library, as another\napproach to incrementalizing computation of the virtual DOM.\n\nSelf-adjusting computations is a general purpose approach to building efficient\non-line computations developed by Umut\nAcar in his dissertation.\nThinking about Incremental in the context of GUI development has lead us to some\nnew ideas about how to build efficient JavaScript GUIs, and some new ideas about\nhow Incremental itself should work. I hope to write more about that in an\nupcoming post.\n\n(You can check out the next post here.)\n\nThanks\n\nMost of what I’ve written here comes from talking to people who know a lot more\nabout the web than I do, and I wanted to thank them. I had some very\nenlightening conversations with Jordan Walke about React’s design and history.\nI’ve also talked a bunch to Evan Czaplicki about Elm, and I’m indebted to\nSpiros Eliopoulos, who helped me learn a\nbit about D3, in part through his\nocaml-d3 bindings. Also thanks to Hugo\nHeuzard and Izzy Meckler for writing a bunch of useful code and helping me learn\nabout js_of_ocaml and more generally about various details of JavaScript and\nmodern browsers.\n",
        "url"      : "https://blog.janestreet.com/incrementality-and-the-web/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Why OCaml?",
        "date"     : "January 25, 2016",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 0,
        "content"  : "\n  \n\n\nHere’s a post from a talk I gave this last summer during our internship program\nabout why we use OCaml. It spends a lot of time on how OCaml fits into the space\nof programming language designs, and why we think OCaml is in a real sweet spot\nin that design space, especially for the kind of work we do at Jane Street.\n\nWarning: it’s a very informal talk, with lots of questions and answers from the\naudience, not all of which are clearly audible, for which I apologize. Still, I\nhope people will get something out of it.\n",
        "url"      : "https://blog.janestreet.com/why-ocaml/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Testing with expectations",
        "date"     : "December 2, 2015",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 5,
        "content"  : "Testing is important, and it’s hard to get people to do as much of it as they\nshould. Testing tools matter because the smoother the process is, the more tests\npeople will write.\n\nEspecially in the functional programming world, most of the talk about testing\ntools is focused on tools for property-based testing, like the\nvarious\nand\nsundry quickcheck-style\nsystems. These are great, but sometimes, you don’t want to write down properties\n– what you want is to write your tests in terms of simple, concrete scenarios.\n\nWe’ve recently added support for what we call expect tests, a kind of test\noptimized for this kind testing. Expect tests allow you to write test scenarios\nwithout having to manually write out the output generated by the code you’re\ntesting. Instead, that output is captured and recorded automatically for you, in\na way that makes it easy to integrate into the source of the test.\n\nOur expect tests were inspired by Mercurial’s unified test\nformat. Unified tests are designed for\ntesting command-line tools like hg, and so are specialized to the shell.\n\nHere’s an example using cram, an implementation of\nthe unified test idea that is independent of Mercurial. Let’s say we want to\ntest the UNIX sort command. We might start by writing a test file, simple.t,\nthat looks like this.\n\nDump some lines into a file\n\n  $ cat &gt; colors &lt;&lt; HERE\n  &gt; red\n  &gt; yellow\n  &gt; green\n  &gt; HERE\n\nsort the file and dump to stdout\n\n  $ sort colors\n\n\n\nIf you then run cram on this, cram will show you that the test failed by showing\nyou a diff.\n\nexpect-test $ cram simple.t\n!\n--- simple.t\n+++ simple.t.err\n@@ -10,5 +10,11 @@\n sort the file and dump to stdout\n\n   $ sort colors\n+  green\n+  red\n+  yellow\n\n\n\nIt also creates a new file, simple.t.err, which contains the output of running\nthe script, intermixed with the script itself. You can accept the new version\njust by moving the err file over the original.\n\nmv simple.t.err simple.t\n\n\n\nIf you run cram now, you’ll see that the tests pass.\n\n$ cram simple.t\n.\n# Ran 1 tests, 0 skipped, 0 failed.\n\n\n\nIf we break the tests somehow, then the diff will show us exactly what failed.\nFor example, if we replace sort with cat, here’s what Cram will show us:\n\nbash-3.2$ cram simple.t \n!\n--- simple.t\n+++ simple.t.err\n@@ -10,7 +10,7 @@\n sort the file and dump to stdout\n\n   $ cat colors\n-  green\n   red\n   yellow\n+  green\n\n# Ran 1 tests, 0 skipped, 1 failed.\n\n\n\nNote how good the diff is for seeing how your test failed.\n\nStarting with the development of Iron last\nyear, we started using cram tests pretty extensively for command-line programs.\nWe found it to be a very productive idiom, but it’s pretty awkward to apply\noutside of the command-line domain. That’s why we started thinking about how to\nget the benefits of cram, but in OCaml.\n\nBreaking out of the shell\n\nUnified tests are great for three reasons:\n\n\n  they let you combine the scenario and the output of that scenario (and\ncomments) into one readable file\n  they help you construct the file automatically\n  they display test failures as easy-to-interpret diffs.\n\n\nNone of these advantages is tied to using the shell. To bring this to OCaml,\nthough, we needed to figure out a reasonable way of embedding these tests in an\nOCaml program, without breaking all of the tooling. We did this by leveraging\nOCaml’s annotations, which let us get the data we need in place without breaking\nfrom the ordinary syntax of an OCaml program. That means that tools like\nmerlin and\nocp-indent and editor modes like\ntuareg will work without incident.\n\nWe can write the OCaml analogue of our cram test by creating the following file,\nnamed simple.ml.\n\nopen Core.Std\n\nlet%expect_test \"simple sort\" =\n  let sorted = List.sort ~cmp:String.compare [\"red\";\"yellow\";\"green\"] in\n  [%sexp_of: string list] sorted |&gt; Sexp.to_string_hum |&gt; print_endline;\n  [%expect {| |}]\n\n\n\nHere, the let%expect_test introduces a new test, and registers it with our\ninline test framework.\n[%expect {| |}] introduces a section where output is captured, and multiple\nsuch declarations can go in a single test.\n\nSince we haven’t actually filled in the output, running the test will fail.\nHere’s the diff it would show.\n\nopen Core.Std\n\n  let%expect_test \"simple sort\" =\n    let sorted = List.sort ~cmp:String.compare [\"red\";\"yellow\";\"green\"] in\n    [%sexp_of: string list] sorted |&gt; Sexp.to_string_hum |&gt; print_endline;\n-   [%expect {| |}]\n+   [%expect {| (green red yellow) |}]\n\n\n\nAs with cram, a new file will have been generated, in this case called\nsimple.ml.corrected, containing the updated test file. As with cram, you can\naccept the new test results by just copying the generated file over the\noriginal.\n\nThe above example is simple, but expect tests really shine when you start doing\nbigger and more complicated scenarios. And the ability to do this in ordinary\nOCaml code means you can use it for a much wider set of applications.\n\nThe source hasn’t been released yet, but it will come out as part of our\nordinary public release process, and we hope others will give it a spin when it\ndoes come out.\n",
        "url"      : "https://blog.janestreet.com/testing-with-expectations/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Quickcheck for Core",
        "date"     : "October 26, 2015",
        "authorId" : "ceastlund",
        "author"   : "Carl Eastlund",
        "tags"     : [],
        "minsToRead" : 13,
        "content"  : "Automated testing is a powerful tool for finding bugs and specifying correctness\nproperties of code. Haskell’s Quickcheck library is the most well-known\nautomated testing library, based on over 15 years of research into how to write\nproperty-base tests, generate useful sources of inputs, and report manageable\ncounterexamples. Jane Street’s Core library has not had anything comparable up\nuntil now; version 113.00 of Core finally has a version of Quickcheck,\nintegrating automated testing with our other facilities like s-expression\nreporting for counterexample values, and support for asynchronous tests using\nAsync.\n\n\n\nMotivation\n\nThere are at least two other Quickcheck implementations on OPAM. Why write\nimplementation N+1?\n\nBefore working at Jane Street, I did some research relating to randomized\ntesting (Eastlund 2009,\nKlein et al.\n2012).\nIn both cases, users of the software packages involved found it easy to write\ntests that use random data, but hard to write random distributions that actually\nproduce values that are useful to test.\n\nThis Quickcheck clone started as an attempt to address those concerns by\nbuilding tools for tuning random distributions of values. One way I’ve done that\nis building in tools for “remembering” sets of chosen values; for example,\nrandom tests won’t repeat values, and it is easy to build distributions that\ngenerate sets or sorted lists and maintain uniqueness. This design is still in\nearly stages, I don’t know if these are the sort of tools that will be needed\nmost, and I’m eager to get feedback from more users. The library has also\nevolved to integrate Quickcheck-style testing with the Jane Street Core and\nAsync libraries. Over time I hope this will also produce useful random\ndistributions for more of the Core types, so they will be easily testable.\n\n\n\nExample\n\nFor those unfamiliar with Haskell’s Quickcheck, the library allows users to\nwrite tests of some property of a function, then test that property on many\nautomatically-generated input values. For example, we might want to test that\nthe optimized implementation of list-append in Core is associative:\n\nTEST_UNIT \"associativity of list append\" =\n      Quickcheck.test Quickcheck.Generator.(tuple3 (list int) (list int) (list int))\n        ~sexp_of:&lt;:sexp_of&gt;\n        ~f:(fun (xs, ys, zs) -&gt;\n          &lt;:test_eq&gt;\n            (List.append xs (List.append ys zs))\n            (List.append (List.append xs ys) zs))\n\n\n\nThe test above randomly generates three lists of integers, appends them together\ntwo different ways, and tests that the results are equal. This process is\nrepeated with different randomly chosen lists each time, until an error is\nreported or the default trial count is reached. Let’s break down the parts of\nthe code here.\n\n\n  TEST_UNIT is a camlp4 syntax for unit tests.\n  Quickcheck.test is the main entry point for running a test using\nQuickcheck. It takes two required, unnamed arguments. The first is a\ngenerator, specifying the probability distribution of values to choose\nfrom when generating inputs for the test. The second is a function that\nconsumes the generated input values and runs a test. The function returns\n() if successful and raises an exception otherwise.\n  Quickcheck.Generator.(tuple3 (list int) (list int) (list int)) constructs\nthe generator that we want to use here. Most of the functions in the\nQuickcheck.Generator module are named after the types they generate; here,\ndefault probability distributions for lists of ints, combined using\ntuple3.\n  We provide the optional named argument ~sexp_of to Quickcheck.test. This\nargument is used to render the first generated value that triggers an error.\nThe &lt;:sexp_of&lt; ... &gt;&gt; expression is camlp4 syntax for the default sexp\nconversion for a type.\n  The final argument to Quickcheck.test is a function that takes the tuples\nof lists produced by our generator, appends them two different ways, and\ncompares the output. &lt;:test_eq&lt; ... &gt;&gt; is camlp4 syntax for an equality\ntest.\n\n\nThe example above uses the s-expression conversions and camlp4 syntax extensions\nthat are common in Jane Street’s libraries, but these things are not necessary\nfor using Quickcheck. Quickcheck.test just needs a generator built from the\nfunctions in Quickcheck.Generator and a function that raises an exception on\nfailure, and it will return () if successful or raise an exception describing\nthe nature of the failure if not.\n\n\n\nGenerators\n\nThe primary data structure used by Quickcheck is the generator, or\n'a Quickcheck.Generator.t. This corresponds to an implementation of the\nArbitrary type class in Haskell’s Quickcheck. Primarily, a generator\nrepresents a random distribution of values of type 'a, although in our\nimplementation there is a little more metadata besides that under the hood. The\nQuickcheck.Generator module provides default distributions of several types,\nand tools for creating more distributions or customizing the existing ones.\n\nIn our example above, we generated three lists of integers using the following\nexpression.\n\nQuickcheck.Generator.(tuple3 (list int) (list int) (list int))\n\n\n\nLooking at the implementation of Core.Std.List.append, we can see that the\nimplementation works in chunks of 5 elements, and changes behavior after 1000\nchunks. So we might want to change our generator to make sure we get lists of\nthe lengths we want to test.\n\nlet open Quickcheck.Generator in\nlet list_int = list int ~length:(`Between_inclusive (4900,5100)) in\ntuple3 list_int list_int list_int\n\n\n\nSome experimentation might show us that this still doesn’t hit the list lengths\nwe want, as often as we want. The [Quickcheck.Generator.int_between]\nfunction, however, is documented as stressing boundary conditions, so we should\nbe able to use it to get values at the upper and lower ends of the range we\nwant. Here, it helps us that generators form a monad. If we combine generators\nusing monadic bind, we get a weighted composition of their probability\ndistributions. We can use that to first generate lengths for our lists, then use\nthose randomly-generated lengths to build generators for the lists themselves.\n\nlet open Quickcheck.Generator in\nlet list_int =\n  int_between ~lower_bound:(Incl 4900) ~upper_bound:(Incl 5100)\n  &gt;&gt;= fun len -&gt;\n  list int ~length:(`Exactly len)\nin\ntuple3 list_int list_int list_int\n\n\n\nNow we have a generator for three lists of integers, each list with a length\nbetween 4900 and 5100 inclusive, weighted toward the ends of that range. This is\nprobably sufficient for our purposes. But if we want to go further down, if we\ndecide that we need a very specific probability distribution, we can build one\nfrom scratch ourselves. Here is a rather idiosyncratic example that demonstrates\nthe tools available in Quickcheck.Generator.\n\nlet open Quickcheck.Generator in\nlet rec ranges_of_five_between lower upper =\n  if upper - lower &lt; 10\n  then of_list (List.range lower upper ~start:`inclusive ~stop:`inclusive)\n  else weighted_union\n         [ 5., singleton (lower + 0)\n         ; 4., singleton (lower + 1)\n         ; 3., singleton (lower + 2)\n         ; 2., singleton (lower + 3)\n         ; 1., singleton (lower + 4)\n         ; 1., of_fun (fun () -&gt; ranges_of_five_between (lower + 5) (upper - 5))\n         ; 1., singleton (upper - 4)\n         ; 2., singleton (upper - 3)\n         ; 3., singleton (upper - 2)\n         ; 4., singleton (upper - 1)\n         ; 5., singleton (upper - 0)\n         ]\nin\nlet list_int =\n  ranges_of_five_between 4900 5100\n  &gt;&gt;= fun len -&gt;\n  list int ~length:(`Exactly len)\nin\ntuple3 list_int list_int list_int\n\n\n\nThis example uses a few more functions from Quickcheck.Generator. The\nof_list function takes a list of values and produces a generator that makes a\nuniform choice among them. weighted_union creates a probability distribution\nrepresenting a weighted choice among the probability distributions of the\nassociated sub-generators. singleton produces constant-valued generators, and\nof_fun produces a lazily-evaluated (but not memoized) generator. (Memoizing\nduring random testing causes some unfortunate space leaks, it is important to be\nable to release resources after a batch of tests.) While this peculiar generator\nis probably not of practical use, it shows that when we need to, we can dig down\ninto the interface and build whatever probability distribution we want.\n\nOf course, it is also useful to construct generators for new types.\n\ntype bst = Leaf | Node of bst * int * bst\nlet gen : bst Quickcheck.Generator.t =\n  let open Quickcheck.Generator in\n  recursive (fun self -&gt;\n    let node =\n      self &gt;&gt;= fun l -&gt;\n      int  &gt;&gt;= fun k -&gt;\n      self &gt;&gt;| fun r -&gt;\n      Node (l, k, r)\n    in\n    union [ singleton Leaf; node ])\n\n\n\nThe function Quickcheck.Generator.recursive is a fixed-point generator that\nhelps build simply-recursive generators that need to invoke themselves and don’t\nhave additional arguments. The function union is like weighted_union, but\nfor uniform choice.\n\n\n\nObservers\n\nIn Haskell’s Quickcheck, there is a duality between the type class Arbitrary\nfor generating random values and Coarbitrary for observing inputs to random\nfunctions. Our version of Quickcheck mirrors Generator with Observer. Most\ntests using Quickcheck do not need an observer, but if you want to generate a\nrandom input for a higher-order function, you will need an observer for the\nfunction’s input type.\n\nTEST_UNIT \"function composition\" =\n  let open Quickcheck.Generator in\n  Quickcheck.test\n    (tuple3\n       (fn Quickcheck.Observer.int    char)\n       (fn Quickcheck.Observer.string int)\n       string)\n    ~f:(fun (f, g, x) -&gt;\n      &lt;:test_eq&lt; char &gt;&gt;\n        ((Fn.compose f g) x)\n        (f (g x)))\n\n\n\nHere, Quickcheck.Generator.fn creates a generator for functions. It takes two\narguments: an observer for the function’s input type and a generator for the\nfunction’s output type.\n\nThink of an observer as a “generator of decision trees”. For instance,\nQuickcheck.Observer.int might randomly generate any of the following decision\ntrees:\n\n?\n\n x&gt;5\n / \\\n?   ?\n\n  x&gt;4\n  / \\\n x&gt;2 ?\n / \\\n?   ?\n\n\n\nThese decision trees control how a randomly-generated function will use its\ninput. The generator for the function’s output is used to fill each of the ?s\nin with a concrete value. The result is a family of functions operating on the\nappropriate types, making randomly-chosen observations on the input and\nproducing randomly-chosen outputs based on those observations.\n\nIf you need to build an observer for a custom type, there are tools for that as\nwell.\n\ntype bst = Leaf | Node of bst * int * bst\nlet obs : bst Quickcheck.Observer.t =\n  let open Quickcheck.Observer in\n    recursive (fun self -&gt;\n      unmap (variant2 unit (tuple3 self int self))\n        ~f:(function\n          | Leaf -&gt; `A ()\n          | Node (l, k, r) -&gt; `B (l, k, r))\n        ~f_sexp:(fun () -&gt; Atom \"variant2_of_bst\"))\n\n\n\nAs with generators, there is a fixed point function\nQuickcheck.Observer.recursive that helps for simply-recursive types. The\nfunction unmap transforms an input of some new type into an input for which we\nalready have an observer. Variant types can be transformed to polymorphic\nvariants, which have default observers variant2 through variant6. Records\nand constructor arguments can be transformed to tuples, which have default\nobservers tuple2 through tuple6.\n\n\n\nWork in Progress\n\nOur OCaml adaptation of Quickcheck is new and still evolving. We already have\nsome changes to the library internally which will be released over time, such as\nmoving default generators and observers out of the Quickcheck module and into\nthe modules for each type. For example, Quickcheck.Generator.int becomes\nInt.gen.\n\nThere are still some pragmatic lessons to learn about how best to use our\nformulation of the library, how to calibrate our default distributions, and what\nother distributions we might want to provide. As always, we hope to get feedback\nfrom anyone who tries out this library so that we can improve it.\n\n\n\nHappy testing!\n",
        "url"      : "https://blog.janestreet.com/quickcheck-for-core/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "rsync rounds timestamps to the nearest second",
        "date"     : "October 7, 2015",
        "authorId" : "cperl",
        "author"   : "Chris Perl",
        "tags"     : [],
        "minsToRead" : 0,
        "content"  : "I’m not sure how I’ve managed to use rsync for so many years without ever\nnoticing this, but hey, you learn something new every day!\n\n[cperl@localhost ~]$ rpm -q --qf '%{name}-%{version}-%{release}\\n' rsync\nrsync-3.0.6-12.el6\n\n[cperl@localhost ~]$ touch foo\n[cperl@localhost ~]$ stat --format \"%y\" foo\n2015-09-24 14:07:05.349357260 -0400\n \n[cperl@localhost ~]$ rsync -a foo bar\n[cperl@localhost ~]$ stat --format \"%y\" bar\n2015-09-24 14:07:05.000000000 -0400\n \n[cperl@localhost ~]$ cp -a foo baz\n[cperl@localhost ~]$ stat --format \"%y\" baz\n2015-09-24 14:07:05.349357260 -0400\n\n\n",
        "url"      : "https://blog.janestreet.com/rsync-rounds-timestamps-to-the-nearest-second/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "No (functional) experience required",
        "date"     : "August 19, 2015",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : ["interviewing"],
        "minsToRead" : 1,
        "content"  : "Jane Street is a serious functional programming shop. We use OCaml, a statically\ntyped functional language for almost everything and have what is probably the\nlargest OCaml codebase anywhere.\n\nThis leads lots of people to think that they shouldn’t even bother applying,\nunder the assumption that we are only interested in hiring deep functional\nprogramming gurus. I think people get to this conclusion in part because they\nthink of functional languages, especially those with fancy type systems, as\narcane tools that can only be used effectively after deep study.\n\nTo the contrary, one of the reasons we started building production systems with\nOCaml was that it was relatively easy to understand, even for people with no\nformal CS background. Since then, we’ve had good experiences taking students\nwith no functional experience at all and getting them to the point of being able\nto complete a project in just a few weeks. We also have a very successful “OCaml\nBootcamp” program, where over four weeks, we train all of the incoming traders\nand many other non-engineer hires on OCaml and our development tools and\nlibraries. By the end, most of them are able to create useful applications.\n\nAll of this is to say that we don’t go out of our way to hire people who are\nalready familiar with functional programming. In practice, it’s just not that\nhard for strong programmers to pick it up after they start.\n\nThat said, an unusually large fraction (but still a minority) of the\nsoftware engineers we hire do come in with functional programming\nexperience – but that’s because of their preferences, not ours.\nProgrammers with an interest in functional languages have an extra\nreason to want to work here, and so we get a high number of good\napplicants from that pool.\n\nThere’s a more general lesson here: using well-loved tools is a good way of\nattracting (and retaining) great software engineers.\n",
        "url"      : "https://blog.janestreet.com/no-functional-experience-required/",
        "image"    : null,
        "topic"    :  ["technology","interviewing"] ,
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Introducing Incremental",
        "date"     : "July 18, 2015",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 9,
        "content"  : "I’m pleased to announce the release of\nIncremental (well\ncommented mli\nhere),\na powerful library for building self-adjusting computations, i.e.,\ncomputations that can be updated efficiently when their inputs change.\n\nAt its simplest, you can think of a self-adjusting computation as a fancy\nspreadsheet. In a spreadsheet, each cell contains either simple data, or an\nequation that describes how the value in this cell should be derived from values\nin other cells. Collectively, this amounts to a graph-structured computation,\nand one of the critical optimizations in Excel is that when some of the cells\nchange, Excel only recomputes the parts of the graph that depend on those\nchanged cells.\n\nWhat makes self-adjusting computation (or SAC) different from a spreadsheet is\nits dynamism. The structure of the computational graph in an SAC can change at\nruntime, in response to the changing input data.\n\nThis dynamism gives you quite a bit of flexibility, which can be used in\ndifferent ways. Here are a few.\n\nOn-line combinatorial algorithms. Incremental is based on work by Umut Acar\net. al.., on self\nadjusting computations (that’s where the term comes from), and there, they were\nmostly interested in building efficient on-line versions of various combinatoral\nalgorithms. In many cases, they could match the asymptotic complexity of custom\non-line algorithms by fairly simple incrementalizations of all-at-once\nalgorithms.\n\nIncremental GUI construction. One simple and natural way to model a GUI\napplication is to structure your display as a function that generates a view\nfrom some more abstract data model.\n\nHaving a function that constructs a view from scratch at every iteration is\nsimple, but prohibitively expensive. But if you can write this function so that\nit produces an incrementalized computation, then you have a solution that is\nboth simple and efficient. We’ve used this technique in a number of our UIs, to\ngood effect.\n\nThis might remind you of how functional reactive programming (FRP) is used for\nconstruction of GUIs in languages like Elm. SAC and FRP\nhave different semantics – FRP is mostly concerned with time-like computations,\nand SAC is mostly about optimizing DAG-structured computations – but they are\nnonetheless closely related, especially at the implementation level. You can see\nmy post here for a description of the broader conceptual\nlandscape that includes both FRP and SAC.\n\nConfigurable computations. An example that comes from our own work is risk\ncalculations. Calculating measures of risk of a portfolio involves combining\ndata from a complex collection of interdependent models. Each of these models is\ndependent both on live data from the markets, and on configurations determined\nby users.\n\nA config change could merely tweak a coefficient, or it could change the overall\nstructure of the computation, say by changing the list of factors used by a\ngiven model. Incremental allows you to build a computation that can update\nefficiently in response to both simple data changes as well as more structural\nconfig changes, in one unified framework.\n\nA taste of Incremental\n\nIt’s hard to give a compelling example of Incremental in action in just a few\nlines of code, because what makes Incremental really useful is how it helps you\nbuild large and complex computations. Nonetheless, small examples can give you a\nsense of how the library works.\n\nTo that end, let’s walk through a few small examples. To begin, we need to\ninstantiate the Incremental functor.\n\nopen Core.Std\nmodule Inc = Incremental_lib.Incremental.Make ()\n\n\n\nEach instance thus generated is its own independent computational world. The\nIncremental functor is generative, meaning it mints fresh types each time it’s\napplied, which prevents values from different incremental worlds from being\nmixed accidentally.\n\nAn Incremental computation always starts at its variables. Modifications to\nvariables are how updates to input data are communicated to Incremental.\n\nLet’s write down a few variables corresponding to the dimensions of a\nrectangular prism.\n\nmodule Var = Inc.Var\n\n(* dimensions of a rectangular prism *)\nlet width_v  = Var.create 3.\nlet depth_v  = Var.create 5.\nlet height_v = Var.create 4.\n\n\n\nWe can use Var.watch to get the (trivial) incremental computions associated\nwith each variable.\n\nlet width  = Var.watch width_v\nlet depth  = Var.watch depth_v\nlet height = Var.watch height_v\n\n\n\nThe following is an incremental computation of the base area of the prism, and\nof the volume.\n\nlet base_area =\n  Inc.map2 width depth ~f:( *. )\nlet volume =\n  Inc.map2 base_area height ~f:( *. )\n\n\n\nIn order to get information out of an incremental computation, we need to\nexplicitly mark which nodes we want data from by creating observer nodes.\nBecause it knows which nodes are observed, the framework can track what parts of\nthe computation are still necessary to the results.\n\nlet base_area_obs = Inc.observe base_area\nlet volume_obs    = Inc.observe volume\n\n\n\nIn order to force the computation to run, we need to explicitly call\nInc.stabilize. Here’s some code that uses stabilize to run the computation and\nthen gets the information from the observers.\n\nlet () =\n  let v = Inc.Observer.value_exn in\n  let display s =\n    printf \"%20s: base area: %F; volume: %F\\n\"\n      s (v base_area_obs) (v volume_obs)\n  in\n  Inc.stabilize ();\n  display \"1st stabilize\";\n  Var.set height_v 10.;\n  display \"after set height\";\n  Inc.stabilize ();\n  display \"2nd stabilize\"\n\n\n\nIf we run this, we’ll se the following output:\n\n1st stabilize: base area: 25.; volume: 125.\n    after set height: base area: 25.; volume: 125.\n       2nd stabilize: base area: 25.; volume: 250.\n\n\n\nNote that setting the height isn’t enough to change the observed values; we need\na stabilization to make that happen.\n\nThat’s a fairly trivial computation, and there certainly isn’t much to\nincrementalize. Let’s try something a little more complicated: a function for\nmerging together an array of incrementals, using some commutative and\nassociative operator like addition or max.\n\nlet rec merge ar ~f =\n    if Array.length ar &lt;= 1 then ar.(0)\n    else\n      let len = Array.length ar in\n      let len' = len / 2 + len % 2 in\n      let ar' =\n        Array.init len' ~f:(fun i -&gt;\n          if i * 2 + 1 &gt;= len then ar.(i*2)\n          else Inc.map2 ar.(i*2) ar.(i*2+1) ~f)\n      in\n      merge ar' ~f;;\n\n\n\nBecause this is done using a binary tree as the dependency graph, the complexity\nof updating an element is log(n), where n is the size of the array. We can\nuse this for, computing an average:\n\nlet average ar =\n  let sum = merge ar ~f:(+.) in\n  Inc.map sum ~f:(fun s -&gt; s /. float (Array.length ar))\n\n\n\nThis works, but we can do better performance-wise, at least, if our merge\noperation has an inverse. In that case, maintaining the sum can in principle be\ndone on constant time, by first, removing the old value before adding in the\nnew. Incremental has a function for taking advantage of this structure.\n\nlet sum ar =\n  Inc.unordered_array_fold ~f:(+.) ~f_inverse:(-.) ar;;\n\n\n\nNow, let’s say we want to do something a little more dynamic: in particular,\nwhat if we want to compute the average of a prefix of the given array? For that,\nwe need to use the bind function, which allows us to produce new incremental\nnodes within an incremental computation.\n\nlet average_of_prefix ar length =\n  Inc.bind length (fun length -&gt;\n    average (Array.init length ~f:(fun i -&gt; ar.(i))))\n\n\n\nThe type of this function is float Inc.t array -&gt; int Inc.t -&gt; float Inc.t, so\nthe length of the prefix is a fully fledged part of the incremental computation.\nAs a result, the dependency structure of this computation changes dynamically,\ne.g., if the value of length is 7, then the computation only depends on\nlength and the first 7 elements of the array.\n\nHopefully this gives you enough of a sense of what Incremental is about to start\nthinking about where it might be useful for you. Note that the overhead of\nincremental is not inconsiderable – on my laptop, firing a single node takes on\nthe order of 30ns, which is far more than, say, summing numbers together.\nIncremental tends to be useful when the computation that is put into a single\nnode is large relative to that overhead, or when the computational graph is\nlarge relative to the sub-graph that needs to be recomputed. Our experience has\nbeen that there are plenty of applications in this space that can benefit from\nIncremental.\n",
        "url"      : "https://blog.janestreet.com/introducing-incremental/",
        "image"    : "https://blog.janestreet.com/introducing-incremental/introducing_incremental.png",
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Converting a code base from camlp4 to ppx",
        "date"     : "July 8, 2015",
        "authorId" : "jdimino",
        "author"   : "Jeremie Dimino",
        "tags"     : ["camlp4","ocaml","ppx"],
        "minsToRead" : 4,
        "content"  : "As with many projects in the OCaml world, at Jane Street we have been working on\nmigrating from camlp4 to ppx. After having developed equivalent ppx rewriters\nfor our camlp4 syntax extensions, the last step is to actually translate the\ncode source of all our libraries and applications from the camlp4 syntax to the\nstandard OCaml syntax with extension points and attributes.\n\nFor instance to translate code using pa_ounit and pa_test, we have to\nrewrite:\n\nTEST = &lt;:test_result&lt; int &gt;&gt; ~expect:42 (f x)\n\n\n\nto:\n\nlet%test _ = [%test_result: int] ~expect:42 (f x)\n\n\n\nFor small to medium projects it is enough to just take a couple hours to\ntranslate the source code by hand. But at Jane Street where we have a huge OCaml\ncode base making extensive use of camlp4, it is simply not realistic. So we\nneeded a tool to do that for us.\n\nWriting a tool to automatically convert the syntax\n\nSince the output of such as tool has to be accepted as the new code source that\nis committed in our repository, it must preserve the layout of the original file\nas much as possible and of course keep the comments. This mean that any approach\nusing an AST pretty-printer would be extremely complex.\n\nThe path we choosed is to textually substitute the foreign syntaxes in the\noriginal file for the new ones. One could imagine doing that with a tool such as\nsed, awk, perl, … however doing it properly would be fastidious and it would\nbe pretty-much impossible to be 100% sure it would never translate things it is\nnot supposed to. Plus writing perl is not as fun as writing OCaml programs.\n\nInstead there is an easy way to find the foreign syntaxes: using camlp4 itself.\nTo subsitute the text of foreign syntaxes the only thing we need to know is\ntheir location in the original file, and camlp4 can help us with that.\n\nWriting dummy camlp4 syntax extensions\n\nThe idea is to write for each camlp4 syntax extension a dummy one that define\nthe same grammar productions as the real one, but instead of generating code it\nsimply record substitutions at certain locations.\n\nThen we do the following:\n\n\n  parse a file with camlp4 and our dummy syntax extensions\n  apply all the recorded substitutions to the original file\n\n\nThis approach has the advantage of interpreting the original file in the exact\nsame way as our regular syntax extensions. Giving us good confidence we did not\nchange the syntactic constructions by mistake.\n\nTo do so we define this API:\n\n(** [replace loc repl] records a text substitution that replaces the\n    portion of text pointed by [loc] by [repl]. *)\n    val replace : Loc.t -&gt; string -&gt; unit\n\n\n\nThen writing a dummy camlp4 syntax extension is pretty easy. For instance for a\nsubset of pa_ounit:\n\nEXTEND Gram\n  GLOBAL: str_item;\n\n  test: [[ \"TEST\" -&gt; replace _loc \"let%test\" ]];\n\n  name_equal:\n    [[ `STRING _; \"=\" -&gt; ()\n     |            \"=\" -&gt; replace _loc \"_ =\"\n     ]];\n\n  str_item:\n    [[ test; name_equal; expr -&gt; &lt;:str_item&lt; &gt;&gt;\n    ]];\nEND\n\n\n\nOn the fly conversion and diffing the generated code\n\nSince this tool was convenient to use, we used it to check that our newly\nwritten ppx rewriters did the same thing as the old camlp4 syntax extensions:\n\n\n  for a given OCaml source file of a library or application, we converted it\nusing camlp4-to-ppx and saved the result\n  we processed the original file using camlp4 and the translated one using our\nppx rewriters\n  in both case we saved the output of -dparsetree (human-readable version of\nthe internal OCaml AST) and -dsource (pretty-printed code)\n  we diffed the camlp4 and ppx outputs of -dparsetree, as well as the\noutputs of -dsource\n\n\nThis was all quite easy to do with jenga. We kept looking at the generated diffs\nuntil they were all empty.\n\nWe have quite a lot of code in our camlp4 syntax extensions and converting them\nto ppx was a long mechanical job and so quite error-prone. Given that this\ndiffing turned out to be really helpful to find errors.\n\nThe Camlp4 Syntax is not quite the OCaml syntax\n\nWhile using this we noticed that quite a few syntaxes accepted by camlp4 are not\naccepted by OCaml, for instance:\n\nlet _ x = x\n\nlet f l = List.map l ~f:fun x -&gt; x + 1\n\n\n\nThese where quite easy to fix automatically as well using camlp4-to-ppx.\n\nGithub repo and extension\n\nWe published a slightly modified version of this tool on\ngithub.\n\nThe method we used doesn’t work out of the box with all syntax extensions. For\ninstance to convert code using lwt.syntax some more work needs to be done on\ncamlp4-to-ppx. But it is a good starting point.\n",
        "url"      : "https://blog.janestreet.com/converting-a-code-base-from-camlp4-to-ppx/",
        "image"    : null,
        "topic"    :  ["technology","camlp4","ocaml","ppx"] ,
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "CPU Registers and OCaml",
        "date"     : "May 5, 2015",
        "authorId" : "vbrankov",
        "author"   : "Vladimir Brankov",
        "tags"     : ["ocaml","performance","registers","speed"],
        "minsToRead" : 4,
        "content"  : "Even though registers are a low-level CPU concept, having some knowledge about\nthem can help write faster code. Simply put, a CPU register is a storage for a\nsingle variable. CPU can keep data in memory or cache or in registers and\nregisters are often much faster. Furthermore, some operations are possible only\nwhen the data is in registers. Hence, the OCaml compiler tries to keep as many\nvariables as it can in the registers.\n\nCode with more than 13 variables is slow?\n\nConsider this primitive statistics computation:\n\nlet stats xs ys =\n  let len = Array.length xs in\n  if len Array.length ys then\n    raise not_of_same_length;\n  let sx = ref 0 in\n  let sy = ref 0 in\n  let sxx = ref 0 in\n  let syy = ref 0 in\n  let sxy = ref 0 in\n  let sax = ref 0 in\n  let say = ref 0 in\n  for i = 0 to len - 1 do\n    let x = Array.unsafe_get xs i in\n    let y = Array.unsafe_get ys i in\n    let ax = abs x in\n    let ay = abs y in\n    let xx = x * x in\n    let yy = y * y in\n    let xy = x * y in\n    sx := !sx + x;\n    sy := !sy + y;\n    sxx := !sxx + xx;\n    syy := !syy + yy;\n    sxy := !sxy + xy;\n    sax := !sax + ax;\n    say := !say + ay;\n  done;\n  !sx, !sy, !sax, !say, !sxx, !syy, !sxy\n\n\n\nRearranging just a few lines produces code 1.5-2x faster:\n\nlet x = Array.unsafe_get xs i in\n    sx := !sx + x;\n    let xx = x * x in\n    sxx := !sxx + xx;\n    let ax = abs x in\n    sax := !sax + ax;\n    let y = Array.unsafe_get ys i in\n    sy := !sy + y;\n    let xy = x * y in\n    sxy := !sxy + xy;\n    let ay = abs y in\n    say := !say + ay;\n    let yy = y * y in\n    syy := !syy + yy;\n\n\n\nWhy is that? CPU has just a few registers:\n\n\n  13 which can contain integers and pointers to arrays and records\n  16 which can contain floating point numbers\n\n\nIf the code has more than that many variables, OCaml compiler has to park the\nextra variables in memory and this parking is called spilling. Actually,\nspilling may happen even when there are less variables, because for example some\noperations like integer multiplication use extra registers.\n\nTherefore, it’s good to try to keep the number of frequently used variables to\n13 or less, or to rearrange the code so that fewer variables are used at the\nsame time.\n\nThe OCaml compiler can show spilled variables when called with the option\n-dreload.\n\nCalling a single function makes subsequent code slower?\n\nIf a function is called, all of the active registers are spilled because it is\nnot known whether the called function will need those registers. That can often\nbe the largest penalty when calling a function, assuming the function is not\ninlined.\n\nLet’s change the previous function:\n\nlet stats xs ys =\n  let sx  = ref 0 in\n  let sax = ref 0 in\n  let sxx = ref 0 in\n  let sy  = ref 0 in\n  let say = ref 0 in\n  let syy = ref 0 in\n  let sxy = ref 0 in\n  let len = Array.length xs in\n  if len &lt;&gt; Array.length ys then\n    failwith (Printf.sprintf \"Arrays not of the same length: %d %d\"\n      len (Array.length ys));\n  for i = 0 to len - 1 do\n    ... \n\n\n\nThis produces 1.35x slower code simply because OCaml compiler will spill all\nof the variables because of Printf.sprintf. In each iteration, OCaml will\npull sx from the memory and store it back.\n\nIt’s a pity that this is actually not necessary. OCaml has to pull sx from the\nmemory and store it back just once, not in each iteration. Looks like that can\nbe improved in the OCaml compiler.\n\nRecursive functions with more parameters are faster?\n\nA function can get data from input parameters, from closure environment and from\nmemory. Input parameters are normally stored in registers. Therefore, using\ninput parameters will generally be slightly faster than the closure environment\nand memory. This can come in handy in recursions.\n\nTake for example this function which finds the index of the given integer in the\narray:\n\nlet index (a : int array) e =\n  let len = Array.length a in\n  let rec loop i =\n    if i &lt; len then begin\n      if Array.unsafe_get a i = e then Some i\n      else loop (i + 1)\n    end else None\n  in\n  loop 0\n\n\n\nThe variables e and len are pulled from closure environment in each\niteration. Moving them to input parameters speeds up the function 1.15-1.33\ntimes:\n\nlet index (a : int array) e =\n  let rec loop e len i =\n    if i &lt; len then begin\n      if e = Array.unsafe_get a i then Some i\n      else loop e len (i + 1)\n    end else None\n  in\nloop e (Array.length a) 0\n\n\n\nFurther reading\n\nProcessor Register\n\nRegister Allocation\n",
        "url"      : "https://blog.janestreet.com/cpu-registers-and-ocaml-2/",
        "image"    : null,
        "topic"    :  ["technology","ocaml","performance","registers","speed"] ,
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Reverse web proxy in ~50 lines of BASH",
        "date"     : "May 1, 2015",
        "authorId" : "rdouglass",
        "author"   : "Ralph Douglass",
        "tags"     : [],
        "minsToRead" : 1,
        "content"  : "In the spirit of reinventing the wheel for fun, I hacked this together as a\nquick challenge to myself last week. It’s a little rough around the edges, but I\nthought it was too cute not to share. If you have any bug fixes, please post\nthem in the comments.\n\n#!/bin/bash\n\nset -e -u\n\nfunction usage() {\n  echo \"USAGE: ${0} parent {port} {lower_bound_backend_port} {upper_bound_backend_port}\"\n}\n\n[[ $# &lt; 1 ]] && usage && exit 1\n\nmode=${1};shift\ncase ${mode} in\n  parent)\n    PORT=${1};shift\n    LOWER=${1};shift\n    UPPER=${1};shift\n    socat TCP-LISTEN:${PORT},fork,reuseaddr \"EXEC:${0} child ${LOWER} ${UPPER}\"\n    ;;\n  child)\n    LOWER=${1};shift\n    UPPER=${1};shift\n    COUNT=0\n    PORT=$(shuf -i ${LOWER}-${UPPER} -n 1)\n    let \"RANGE = UPPER - LOWER\"\n    SUCCESS=false\n    while ((COUNT &lt;= RANGE)) || ${SUCCESS}; do \n      set +e\n      if socat STDIN TCP:127.0.0.1:${PORT},connect-timeout=2; then\n        SUCCESS=true\n        break\n      else\n        echo \"unable to connect to port ${PORT}\" &gt;&2\n      fi\n      set -e\n      let COUNT+=1\n      let PORT+=1\n      if ((PORT &gt; UPPER )); then\n        let 'PORT = LOWER'\n      fi\n    done\n    if ! ${SUCCESS}; then\n      echo \"HTTP/1.1 500 Internal Server Error\"\n      echo\n      echo \"All REDACTED servers are down.  Please report to REDACTED@janestreet.com.\"\n      exit 1\n    fi\n    ;;\n  *)\n    usage\n    exit 1\n    ;;\nesac\n\n\n",
        "url"      : "https://blog.janestreet.com/reverse-web-proxy-in-50-lines-of-bash/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Building a lower-latency GC",
        "date"     : "April 10, 2015",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 9,
        "content"  : "We’ve been doing a bunch of work recently on improving the responsiveness of\nOCaml’s garbage collector. I thought it would be worth discussing these\ndevelopments publicly to see if there was any useful feedback to be had on the\nideas that we’re investigating.\n\nThe basic problem is a well-known one: GCs can introduce unpredictable pauses\ninto your application, and depending on how your GC is configured, these pauses\ncan be quite long. Unpredictable latencies are a problem in a wide variety of\napplications, from trading systems to web stacks.\n\nOne approach people often take is to avoid using the allocator altogether: pool\nall your objects, and never allocate anything else. You can even keep many of\nyour pooled objects outside of the heap.\n\nThis works, but makes for a less pleasant coding experience (and code that is\ntrickier and harder to reason about.) So while pooling is a valuable technique,\nwe’d like to have a GC that lets you run with low latencies without sacrificing\nthe ability to allocate.\n\nWhat are the problems?\n\nOCaml’s garbage collector is already pretty good from a latency perspective.\nCollection of the major heap in OCaml is incremental, which means that\ncollection of the major heap can be done in small slices spread out over time,\nso no single transaction need experience the full latency of walking the major\nheap. Also, collection of the minor heap is pretty fast, and OCaml programs tend\nto do pretty well with a relatively small minor heap – typical advice in\nJava-land is to have a young generation in the 5-10 GiB range, whereas our minor\nheaps are measured in megabytes.\n\nStill, there are problems with OCaml’s collector.\n\nNo profiling\n\nThere’s no good way in the stock runtime to see how long the different parts of\ncollection take, and that makes it hard to optimize.\n\nFalse promotions\n\nOCaml’s generational collector is very simple: objects are typically allocated\nfirst on the minor heap, where the work is effectively three inlined\ninstructions to bump a pointer and check whether you’ve hit the end. When you do\nhit the end, you do a minor collection, walking the minor heap to figure out\nwhat’s still live, and promoting that set of objects to the major heap.\n\nIn a typical functional workload, most of your allocations are short-lived, and\nso most of the minor heap is dead by the time you do the minor collection, so\nthe walk of the minor heap can be quite cheap. But there’s always a small number\nof false promotions, objects that would have become unreachable shortly, but\nwere promoted because the minor collection came at an inconvenient time.\n\nClocking\n\nOne fundamental issues with the stock runtime is that the collector is clocked\nin terms of minor allocations – ignoring, critically, the amount of time that\nhas gone by.\n\nThis clocking makes sense for many applications, but if you’re building a server\nthat needs to respond to bursty traffic with low and predictable latencies, this\nis the opposite of what you want. Really, what you’d prefer to do is to defer GC\nwork when you’re busy, instead scheduling it at times when the application would\notherwise be idle.\n\nOne solution here is to allow the application to drive the scheduling of the GC,\nbut the runtime in its current form doesn’t really support doing this. In\nparticular, while you can choose to explicitly run a major slice, the collector\naccounting doesn’t take note of the work that has been done that way, so the\nmajor collector works just as hard as it did previously.\n\nFurthermore, the major slice always forces a minor collection. But running minor\ncollections all the time is problematic in its own right, since if you run them\nwhen the minor heap is too small, then you’ll end up accidentally promoting a\nrather large fraction of your minor allocations.\n\nInsufficient incrementality\n\nWhile the major collector is mostly incremental, not everything about it runs\nincrementally. In particular, when the major collector hits an array, it walks\nthe array all at once. This is problematic if you start using large arrays,\nwhich does happen when one is using pooling techniques. Similarly, the collector\nis not incremental when it comes to scanning GC roots.\n\nImmediate accounting\n\nThe stock runtime decides how big of a major slice to do based on how much was\npromoted to the major heap in the last minor collection. This is part of a\nheuristic that is meant to make sure that the collector keeps up with the rate\nof allocation, without running needlessly when the application isn’t promoting\nmuch.\n\nBut the accounting is in some sense too immediate: if you do a lot of promotion\nin a given cycle, you’re forced to do the collection immediately. While all that\nwork needs to be done, it’s not clear that it needs to be done immediately.\nAgain, for responsive systems, it’s often better to push off work until after\nthe busy times.\n\nMaking it better\n\nHappily, Damien Doligez, the author of OCaml’s GC, has been visiting with us for\nthe last few months, and has been doing a lot of good work to improve the\nruntime, and in particular to address the concerns raised above. Here’s the\nsummary of the changes made thus far.\n\nBetter profiling\n\nA set of probes was added to the GC, allowing us to record in a quite detailed\nway every phase of the collection process. This is quite detailed, telling you\nthe phase (marking vs sweeping) and the sub-phase, as well as keeping track of a\ncollection of useful counters. This is available in the\ninstrument branch.\n\nAging\n\nDamien has also implemented\naging in the minor heap.\nAging is a technique whereby objects stay in the minor heap for several minor\ncollections before being promoted to the major heap. The goal of aging is to\nreduce the amount of false promotion.\n\nBetter incrementalization\n\nSeveral of the stages of the collector have been made interruptible, including\nscanning of arrays and of the roots. The effect here is to reduce the worst-case\ndelays imposed by the collector. This is in the\nlow-latency branch.\n\nSeparating major slices from minor collections\n\nIn the stock runtime, major slices and minor collections are always done\ntogether. In the\nlow-latency branch,\nyou can run one without the other, and you can basically run them at any time.\nThis has a couple of advantages – one is that it’s essentially another form of\nincrementalization, allowing you to do less work per GC pause.\n\nThe other is that it gives you more freedom to schedule collections when you\nwant to. One way we’re looking at using this is to have an application-level job\nthat wakes up periodically, and does a heuristic check to see if the system\nappears busy. If it doesn’t, then it schedules some GC work, and it may choose\nto do either a minor collection or a major slice. A minor collection would only\nbe chosen in the case that the minor heap is bigger than some configured level,\nto avoid too much false promotion; but a major collection can be done at any\ntime.\n\nSmoothed work-accounting\n\nInstead of just keeping track of the amount of work that needs to be done in the\nnext major slice, the GC in the\nlow-latency branch\ntracks work that must be done over the next n major slices, by keeping these\nnumbers in a circular buffer.\n\nThe runtime also uses these buckets for keeping track of extra work that has\nbeen done by application-forced major slices. A forced major slice takes work\naway from the front-most bucket, potentially bringing the bucket to negative\nterritory.\n\nWhen the runtime checks if it needs to do a major slice, it looks at the first\nbucket. If it’s got a positive amount of work in it, then that work is done in\nthat slice, if possible. Whatever is left over (which may be positive or\nnegative) is spread out uniformly over the next n buckets.\n\nSegmented free lists\n\nA big part of the cost of minor collections is the cost of finding free blocks.\nOne observation is that in many OCaml applications, block sizes are quite small.\nOne way of taking advantage of this is to have a set of size-segregated\nfree-lists, for a configurable set of sizes. e.g., one could have a different\nfree list for blocks with 1, 2, 3 and 4 slots.\n\nThis is still ongoing (read: not working yet), but it will show up in the\nmulti-free-list\nbranch eventually.\n\nHow is it going?\n\nThis is all very much a work in progress, but the results so far have been quite\npromising. By using a version of the compiler with most of these changes and\nwith an application-driven job that forces major slices in quiet times, we were\nable to reduce tail latencies by a factor of 3 in a real production application.\nThat’s pretty good considering that we’ve done essentially no parameter tuning\nat this point.\n\nThat said, some of the results are less promising. We were somewhat disappointed\nto see that when doing more traditional batch jobs, aging didn’t provide a\nsignificant improvement in overall compute time. It seems like in many\napplications, aging saves some on promotion, but the minor collection itself\ngets a little more expensive, and these seem to nearly cancel out.\n\nThis seems especially surprising given that aging is present in most GCs,\nincluding those for Java’s HotSpot, the .NET CLR, and GHC. Given that everyone\nseems to use aging, I would have expected aging to have a quite noticeable\nbenefit for lots of workloads, not just carefully tuned packet processors.\n\nA call for help\n\nThe progress we’ve made so far is quite promising, but a lot of things are still\nup in the air. The reason that I wanted to post about it now is that I was\nhoping to hear feedback from others who have experience dealing with similar\nissues in other languages.\n\nSo, if you have thoughts about the techniques we’ve tried for making OCaml’s\nruntime more responsive, or suggestions for other techniques we should consider,\nplease comment!\n",
        "url"      : "https://blog.janestreet.com/building-a-lower-latency-gc/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Faster OCaml to C calls",
        "date"     : "April 9, 2015",
        "authorId" : "vbrankov",
        "author"   : "Vladimir Brankov",
        "tags"     : ["c","ocaml","performance","speed"],
        "minsToRead" : 2,
        "content"  : "The official OCaml documentation “Interfacing C with\nOCaml” doesn’t\ndocument some interesting performance features.\n\nC functions with no OCaml allocation\n\nA C call can allocate OCaml data and pass it back to OCaml, for example using\ncaml_copy_string(s). Between the C call allocating OCaml data and passing it\nback, it has to make sure that OCaml’s Garbage Collector doesn’t collect it, as\nthe Garbage Collector can be triggered during the C call. There’s an intricate\nmechanism which assures that, part of which are CAMLparam, CAMLlocal and\nCAMLreturn.\n\nThis mechanism can be bypassed if the C call is not going to allocate any OCaml\ndata. This can yield performance benefits especially in shorter functions. To\nbypass it, CAMLparam, CAMLlocal and CAMLreturn should not be used\nand the primitive should be declared with “noalloc”.\n\nFor example, OCaml’s compare is not smart to avoid branch\nmispredictions for floats.\nMoving comparison to C speeds it up a little bit. “noalloc” speeds it up a\nlot.\n\nfloat compare            8.93 ns\nfloat_compare_c          7.88 ns\nfloat_compare_c_noalloc  5.32 ns\n\nexternal float_compare_noalloc : float -&gt; float -&gt; int =\n  \"float_compare_noalloc_stub\" \"noalloc\"\n\nCAMLprim value float_compare_noalloc_stub(value vf, value vg)\n{\n  double f = Double_val(vf);\n  double g = Double_val(vg);\n  return Val_int((f &gt; g) - (f &lt; g) + (f == f) - (g == g));\n}\n\n\n\nC functions with float arguments and float return\n\nSince C code must\nuse boxed OCaml floats, any\nunboxed float must be boxed prior to the C call. This is not cheap, especially\nfor fast functions. This boxing can be avoided if the C call accepts and returns\nonly floats.\n\nFor example, float_min can be replaced with a single CPU instruction.\nUnfortunately, the C implementation is much slower because of boxing floats:\n\nfloat_min                  6.09 ns\nfloat_min_c               15.92 ns\n\nlet float_min (x : float) y = if x &lt; y then x else y\n\nCAMLprim value float_min_stub(value x, value y)\n{\n  CAMLparam2(x, y);\n  CAMLlocal1(v);\n  double z = Double_val(y);\n\n  __asm__ (\"minsd %1, %0;\" : \"+&x\"(z) : \"x\"(Double_val(x)));\n  v = caml_copy_double(z);\n  CAMLreturn(v);\n}\n\nexternal float_min_c : float -&gt; float -&gt; float = \"float_min_stub\"\n\n\n\nTo avoid boxing, C function’s arguments and return should be “double”,\nCAMLparam, CAMLlocal and CAMLreturn should be avoided and the\nprimitive should include “float” and both interpreted and compiled\nimplementation:\n\nfloat_min_c_float          1.95 ns\n\nexternal float_min_inan_c_float : float -&gt; float -&gt; float\n  = \"float_min_inan_float_bytecode\" \"float_min_inan_float_stub\" \"float\"\n\nCAMLprim double float_min_inan_float_stub(double x, double y)\n{\n  double z = y;\n  __asm__ (\"minsd %1, %0;\" : \"+&x\"(z) : \"x\"(x));\n  return z;\n}\n\n\n\nC functions with float arguments and non-float return\n\nWe might be able to further speed up float_compare_noalloc if we avoided\nboxing. Alas, that function returns integer so it’s impossible to use\n“float”. Is it still possible to avoid boxing? The answer is yes, by simply\nconverting float to int.\n\nfloat_compare_c_float    3.73 ns\n\nCAMLprim double float_compare_float_stub(double f, double g)\n{\n  return (f &gt; g) - (f &lt; g) + (f == f) - (g == g);\n}\n\nexternal float_compare_float : float -&gt; float -&gt; float\n  = \"float_compare_float_bytecode\" \"float_compare_float_stub\" \"float\"\nlet float_compare_float x y = int_of_float (float_compare_float x y)\n\n\n",
        "url"      : "https://blog.janestreet.com/faster-ocaml-to-c-calls/",
        "image"    : null,
        "topic"    :  ["technology","c","ocaml","performance","speed"] ,
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Why GADTs matter for performance",
        "date"     : "March 30, 2015",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 15,
        "content"  : "When GADTs (Generalized Algebraic Data\nTypes) landed in\nOCaml, I wasn’t particularly happy about it. I assumed that it was the kind of\nnonsense you get when you let compiler writers design your programming language.\n\nWhich is to say that the standard GADT examples all seem to be about the kinds\nof things that compiler writers do, like embed domain-specific languages or\nbuild typed abstract-syntax trees. But it didn’t seem particularly relevant for\nthe kind of systems programming that I think about.\n\nBut it became apparent pretty quickly that I was wrong. In particular, since\nGADTs have landed, at Jane Street we’ve found lots of examples where GADTs are\nimportant for performance, of all things. The theme in these examples is that\nGADTs enable you to tweak your memory representations in ways that would\notherwise be painful or impossible to do safely in OCaml.\n\nThe Problem of Polymorphism\n\nI’d like to walk through a simple example that illustrates this aspect of GADTs,\nbut first, a few words about OCaml’s memory representation. OCaml’s polymorphism\nis in an important way backed on that memory representation. In particular,\nconsider a simple polymorphic function like List.iter, which has the following\ntype:\n\nval iter: 'a list -&gt; f:('a -&gt; unit) -&gt; unit\n\n\n\nThe polymorphic type tells you that List.iter can operate on lists of any\ntype, and in OCaml, this is achieved with a single compiled version of the code.\nThis is possible because the memory representation of the elements of a list are\nuniform: you can always refer to an OCaml value in a single word, either as a\npointer to a heap-allocated value, or as an immediate that fits inside that\nword.\n\nThat means that some OCaml datatypes are less efficient space-wise than you\nmight imagine. Arrays, for example, take the same amount of space per element\nwhether those elements are bytes, 32-bit ints, or 64-bit ints. (There’s actually\nsome special magic in the compiler for float arrays, though this is probably\nmore trouble than it’s worth, as described by Alain Frisch\nhere. But let’s ignore\nfloat arrays for now.)\n\nOCaml does have a tighter representations for byte arrays, called bytes. But\nit’s a completely different type, and so building a general purpose data\nstructure that uses bytes when it would make sense, and ordinary arrays\notherwise, is a little awkward.\n\nControlling memory representation without GADTs\n\nLet’s see what happens if we try to design (without GADTs) an array type that\nsometimes uses the general array representation and sometimes uses bytes.\n\nYou could imagine representing such a value using an ordinary variant.\n\ntype 'a t = | Array of 'a array\n            | Bytes of bytes\n\n\n\nWe could then implement each of the operations we want on our new array type,\nimplementing each operation differently depending on the particular\nrepresentation. Let’s see what happens if we just take this idea and run with\nit, implementing all the required functions in the most straightforward way.\n\n&gt; module Compact_array = struct\n\n    type 'a t = | Array of 'a array\n                | Bytes of bytes\n\n    let of_bytes x : char t = Bytes x\n    let of_array x = Array x\n\n    let length = function\n      | Array a -&gt; Array.length a\n      | Bytes s -&gt; Bytes.length s\n\n    let get t i =\n      match t with\n      | Array a -&gt; a.(i)\n      | Bytes s -&gt; s.[i]\n\n    let set t i v =\n      match t with\n      | Array a -&gt; a.(i) &lt;- v\n      | Bytes s -&gt; s.[i] &lt;- v\n\n  end;;\n\nmodule Compact_array :\n  sig\n    type 'a t = Array of 'a array | Bytes of bytes\n    val of_bytes : bytes -&gt; char t\n    val of_array : 'a array -&gt; 'a t\n    val length : 'a t -&gt; int\n    val get : char t -&gt; int -&gt; char\n    val set : char t -&gt; int -&gt; char -&gt; unit\n  end\n\n\n\nThis seems pretty good at first glance, but the inferred types aren’t quite what\nwe want. In particular, get and set only work with Compact_arrays\ncontaining characters. If you think about how type inference works, it’s not\nreally all that surprising. If you think about the code for get:\n\nlet get t i =\n  match t with\n  | Array  a -&gt; Array.get  a i\n  | String s -&gt; String.get s i\n\n\n\nThe OCaml compiler is looking for a single type to assign to the return value\nfor all the cases of the match. Given that String.get always returns a char,\nthen Compact_array.get will be restricted to only returning a char.\n\nOne way to work around this problem is to essentially implement what we want as\na poor-man’s object. Here, we just write the code separately for the different\ncases, and stuff those functions into a record full of closures. Here’s how that\nlooks.\n\n&gt; module Compact_array = struct\n\n  type 'a t = { len: unit -&gt; int\n              ; get: int -&gt; 'a\n              ; set: int -&gt; 'a -&gt; unit\n              }\n\n  let of_string s =\n    { len = (fun () -&gt; String.length s)\n    ; get = (fun i -&gt; String.get s i)\n    ; set = (fun i x -&gt; String.set s i x)\n    }\n\n  let of_array a =\n    { len = (fun () -&gt; Array.length a)\n    ; get = (fun i -&gt; Array.get a i)\n    ; set = (fun i x -&gt; Array.set a i x)\n    }\n\n  let length t = t.len ()\n  let get t i = t.get i\n  let set t i x = t.set i x\n\nend;;\nmodule Compact_array :\n  sig\n    type 'a t = {\n      len : unit -&gt; int;\n      get : int -&gt; 'a;\n      set : int -&gt; 'a -&gt; unit;\n    }\n    val of_string : bytes -&gt; char t\n    val of_array : 'a array -&gt; 'a t\n    val length : 'a t -&gt; int\n    val get : 'a t -&gt; int -&gt; 'a\n    val set : 'a t -&gt; int -&gt; 'a -&gt; unit\n  end\n\n\n\nThis more or less solves the problem, but it’s still not really the memory\nrepresentation we want. In particular, we have to allocate three closures for\neach Compact_array.t, and this number of closures will only go up as we add\nmore functions whose behavior depends on the underlying array.\n\nGADTs to the rescue\n\nLet’s go back to our failed variant-based implementation, but rewrite it using\nthe GADT syntax. Note that we’re not trying to change the types at all this\ntime, just rewriting the same type we had before in the language of GADTs.\n\ntype 'a t = | Array : 'a array -&gt; 'a t\n            | Bytes : bytes -&gt; 'a t\n\n\n\nThe syntax of this declaration suggests thinking about variant constructor like\nArray or Bytes as functions from the constructor arguments to the type of\nthe resulting value, with the thing to the right of the : roughly\ncorresponding to the type signature of the constructor.\n\nNote that for the Array constructor, the type value of 'a depends on the\ntype of the argument:\n\n&gt; Array [|1;2;3|];;\n- : int t = Array [|1; 2; 3|]\n&gt; Array [|\"one\";\"two\";\"three\"|];;\n- : bytes t = Array [|\"one\"; \"two\"; \"three\"|]\n\n\n\nBut for the Bytes constructor, the type 'a in the type is still free.\n\n&gt; Bytes \"foo\";;\n- : 'a t = Bytes \"foo\"\n\n\n\nThis is really the problematic case, because we’d like for Bytes \"foo\" for the\nparameter 'a to by char, since in the Bytes case, that’s what the element\ntype of our array is.\n\nBecause GADTs give us the ability to specify the type on the right-hand side of\nthe arrow, we can get that.\n\ntype 'a t = | Array : 'a array -&gt; 'a t\n            | Bytes : bytes -&gt; char t\n\n\n\nNow, the Bytes constructor behaves as we’d like it too.\n\n&gt; Bytes \"foo\";;\n- : char t = Bytes \"foo\"\n\n\n\nNow let’s see what happens when we try to write the length function.\n\n&gt; let length t = \n     match t with\n     | Bytes b -&gt; Bytes.length b\n     | Array a -&gt; Array.length a\n  ;;\nval length : char t -&gt; int = &lt;fun&gt;\n\n\n\nDisappointingly, we’re again stuck with a function that doesn’t have the right\ntype. In particular, the compiler has decided that this function can only\noperate on char t, when we want it to work for arrays of any type.\n\nBut the problem now is that type inference in the presence of GADTs is\ndifficult, and the compiler needs a little help. Roughly speaking, without some\nhints, OCaml’s type system will try to identify all types as having a single\nvalue within a given function. But in this case, we need a type variable which\nmight have different values in different branches of a match statement.\n\nWe can do this by creating a locally-abstract type el to represent the type\nparameter of t (and the element type), and annotating t accordingly.\n\n&gt; let length (type el) (t:el t) = \n     match t with\n     | Bytes b -&gt; Bytes.length b\n     | Array a -&gt; Array.length a\n  ;;\nval length : 'a t -&gt; int = &lt;fun&gt;\n\n\n\nNow we see that we get the right type. We can push this approach through to get\na complete implementation.\n\n&gt; module Compact_array = struct\n\n    type 'a t = | Array  : 'a array -&gt; 'a t\n                | Bytes : bytes -&gt; char t\n\n    let of_bytes x = Bytes x\n    let of_array x = Array x\n\n    let length (type el) (t:el t) =\n      match t with\n      | Array a -&gt; Array.length a\n      | Bytes s -&gt; Bytes.length s\n\n    let get (type el) (t:el t) i : el =\n      match t with\n      | Array a -&gt; Array.get a i\n      | Bytes s -&gt; Bytes.get s i\n\n    let set (type el) (t:el t) i (v:el) =\n      match t with\n      | Array a -&gt; Array.set a i v\n      | Bytes s -&gt; Bytes.set s i v\n\n  end;;\nmodule Compact_array :\n  sig\n    type 'a t = Array : 'a array -&gt; 'a t | Bytes : bytes -&gt; char t\n    val of_bytes : bytes -&gt; char t\n    val of_array : 'a array -&gt; 'a t\n    val length : 'a t -&gt; int\n    val get : 'a t -&gt; int -&gt; 'a\n    val set : 'a t -&gt; int -&gt; 'a -&gt; unit\n  end\n\n\n\nAs I said at the beginning, this is really just an example of the more general\ntheme. GADTs are about more than clever typed interpreters; they’re a powerful\nmechanism for building easy to use abstractions that give you more precise\ncontrol of your memory representation. And getting the right memory\nrepresentation is often critical for building high performance applications.\n",
        "url"      : "https://blog.janestreet.com/why-gadts-matter-for-performance/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "A lighter Core",
        "date"     : "March 21, 2015",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 6,
        "content"  : "We recently released a version of our open source libraries with a much\nanticipated\nchange\n– Async_kernel, the heart of the Async concurrent programming library, now\ndepends only on Core_kernel rather than on Core.\n\nThis sounds like a dull, technical change, and it kind of is. But it’s also part\nof a larger project to make our libraries more lightweight and portable, and so\nsuitable for a wider array of users and applications.\n\nWe’ve actually been working on these issues for a while now, and this seems like\na good time to review some of the changes we’ve made over the years, and what’s\nstill to come.\n\nReorganizing for portability\n\nCore has always had dependencies on Unix, including OCaml’s Unix library, as\nwell as some other parts of the Unix environment, like the Unix timezone files.\nThis has long been a problem for porting to Windows, but more recently, the\nissue has loomed for two other increasingly important platforms for\nOCaml: Javascript\nand Mirage.\n\nTo help fix this problem, in 2013 we released a library called Core_kernel,\nwhich is the portable subset of Core that avoids Unixisms as well as things like\nthreads that don’t match well with the Javascript and Mirage back-ends.\n\nIn the same vein, we refactored Async, our concurrent programming library, into\na set of layers (modeled on the design of the similar Lwt library) that both\nclarified the design and separated out the platform specific bits. Async_kernel\nis the lowest level and most portable piece, hosting the basic datastructures\nand abstractions. Async_unix adds a Unix-specific scheduler, and Async_extra\nbuilds further os-specific functionality on top.\n\nUntil recently, the fly in this ointment was that Async_kernel still depended\non Core, rather than Core_kernel, because only Core had a time library. Making\nAsync_kernel only require Core_kernel was a bigger project than you might\nimagine, in the end leading us to change\nTiming_wheel,\na core datastructure for Async and several other critical libraries at Jane\nStreet, to use an integer representation of time instead of the float-based one\nfrom Core.\n\nAlready, some experiments are underway to take advantage of this change,\nincluding some internal efforts to get Async working under javascript, and\nexternal efforts to get cohttp’s Async\nback-end to only depend on Async_kernel.\n\nI’m hoping that yet more of this kind of work will follow.\n\nModule Aliases\n\nOne long-running annoyance with OCaml is the lack of an effective namespace\nmechanism. For a long time, the only choice was OCaml’s packed modules, which\nlet you take a collection of modules and merge them together into one\nmega-module. Some kind of namespace mechanism is essential at scale, and so we\nused packed modules throughout our libraries.\n\nUnfortunately, packed modules have serious downsides, both in terms of\ncompilation time and executable sizes. We’ve been talking to people about this\nand looking for a solution for a long time. You can check out this epic\nthread on\nthe platform list if you want to\nsee some of the ensuing conversation.\n\nA solution to this problem finally landed in OCaml 4.02, in the form of module\naliases. I’ll skip the detailed explanation (you can\nlook here if you want to learn\nmore), but the end result was great: our compilation times immediately went down\nby more than a factor of 3, and it gave us a path towards dropping packed\nmodules altogether, thus reducing executable sizes and making incremental\ncompilation massively more efficient.\n\nThe work on dropping packed modules has already landed internally, and will\nhopefully make it to the external release in a few months. The benefit to\nexecutable size is significant, with typical executables dropping in size by a\nfactor of 2, but there is more to do. OCaml doesn’t have aggressive dead code\nelimination, and that can lead to a lot of unnecessary stuff getting linked in.\nWe’re looking at some improvements we can make to cut down the dependency tree,\nbut better dead code elimination at the compiler would really help.\n\nSharing basic types\n\nInteroperability between Core and other OCaml libraries is generally pretty\ngood: Core uses the same basic types (e.g., string, list, array, option) as\nother OCaml code, and that makes it pretty easy to mix and match libraries.\n\nThat said, there are some pain points. For example, Core uses a Result type\n(essentially, type ('a,'b) result = Ok of 'a | Error of 'b) quite routinely,\nand lots of other libraries use very similar types. Unfortunately, these\nlibraries each have their own incompatible definitions.\n\nThe solution is to break out a simple type that the different libraries can\nshare. After some discussion with the people behind some of the other libraries\nin question, I made a pull request to\nthe compiler to add a result type to the stdlib.\n\nThis is a small thing, but small things matter. I hope that by paying attention\nto this kind of small issue, we can help keep interoperability between Core and\nthe rest of the OCaml ecosystem smooth.\n\nEliminating camlp4\n\nOne concern I’ve heard raised about Core and Jane Street’s other libraries is\ntheir reliance on camlp4. camlp4 is a somewhat divisive piece of infrastructure:\nit’s long been the only decent way to do metaprogramming in OCaml, and as such\nhas been enormously valuable; but it’s also a complex and somewhat unloved piece\nof infrastructure that lots of people want to avoid.\n\ncamlp4 also makes tooling a lot more complicated, since there’s no single syntax\nto target. Dev tools like ocp-indent\nand the excellent merlin have\nsome terrible hacks to support some of the most common camlp4 syntax extensions,\nbut the situation is clearly untenable.\n\nYou do need camlp4 to build Core, but you don’t need camlp4 to use it, and in\npractice, that’s good enough for most use cases. But for people who want to\navoid camlp4 entirely, it’s still a nuisance. Moreover, while you don’t need\ncamlp4 to use Core, it is convenient. For example, a lot of Core’s idioms work\nbest when you provide s-expression serializers for your types, and the\nsexplib syntax extension is an awfully\nconvenient way to generate those functions.\n\nOur plan is to simply eliminate our dependency on camlp4 entirely over the next\n6 months, by switching to using ppx and extension\npoints,\na new approach to metaprogramming in OCaml that, like module aliases, landed in\n4.02. We’re currently rewriting all of our syntax extensions, and building tools\nto automatically migrate the code that depends on camlp4. People who want to\ncontinue to use the old camlp4 extensions are welcome to continue doing so, but\nwe’re cutting our dependency on them.\n\n\n\nEven at the end of all this, we don’t expect that Core and Async will suit\neveryone – that’s a hard bar to cross for any software package. But we do hope\nthat through these efforts, an ever wider set of developers will be able to take\nadvantage of the work we’ve done.\n",
        "url"      : "https://blog.janestreet.com/a-lighter-core/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Centralizing distributed version control, revisited",
        "date"     : "March 4, 2015",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 8,
        "content"  : "7 years ago, I wrote a blog\npost\nabout how we at Jane Street were using our distributed version control system\n(hg, though the story would be the same for git) in a partially centralized\nway. Essentially, we built a centralized repo and a continuous integration\nsystem whose job was to merge in new changesets. The key responsibility of this\nsystem was to make sure that a change was rejected unless it merged, compiled\nand tested\ncleanly.\n\nThis half-distributed, half-centralized approach let us enjoy the benefits of a\nDVCS, while still getting the coherence and easy of sharing that comes from\nhaving a central authoritative source.\n\nSince then, our development tools have changed a lot, including the arrival of a\nnew code review and release management system called\nIron.\nIn writing Iron we discovered that centralization was valuable in ways we hadn’t\nconsidered before. In particular, despite the fact that good support for merging\nis central to a DVCS, centralization is actually a critical ingredient to making\nmerges work better.\n\nTo understand how centralization can help, let’s talk about one reason why\nmerging is a fraught process to begin with.\n\nThe criss-cross merge\n\nThe basic approach to merging in a DVCS like hg or git is pretty simple.\nHere are the basic steps that are taken to merge two heads A and B.\n\n\n  Find the greatest common ancestor (GCA(A,B)) of the heads to be merged.\n  Compute the patch from that base point to one of the two heads, say, A.\n  Take the patch you just computed, and apply it to B. Conflicts appear when\nthe patch, which was actually based on GCA(A,B), doesn’t apply cleanly to\nB. The result of this process is the merge.\n\n\nThe above discussion oversimplifies the story by assuming there’s a well defined\nGCA, but this just isn’t always true. To see why, consider a repository staring\nwith a root revision R, and two revisions made independently on top of R.\n\n  A\n /\nR\n \\\n  B\n\n\n\nNow, imagine that two different developers each concurrently decide to merge the\nheads A and B and do some further development. Note that in both of the\ncases shown below, the GCA for the merge between A and B is R.\n\nDeveloper 1                Developer 2\n\n  A---C--D                    A     \n /   /                       / \\        \nR   /                       R   \\       \n \\ /                         \\   \\  \n  B                           B---E--F\n\n\n\nNow, if we bring these two separate histories together into one repo, we have\nsomething like this.\n\n  A---C--D\n / \\ /\nR   \\\n \\ / \\\n  B---E--F\n\n\n\nNow, what happens if we want to merge D and F? In particular, what is\nGCA(D,F)? Both A and B are common ancestors, but neither one is greater\nthan the other. In this case, there are in some sense two different GCAs, or,\nmore precisely, there are multiple maximal common ancestors, or MCAs. This\ncase is often described as a criss-cross merge, and is the source of much\nwailing and gnashing of teeth among developers and users of DVCSs.\n\ngit and hg have different ways of dealing with the case of multiple MCAs. By\ndefault, hg just picks one of the MCAs arbitrarily and does the merge based on\nthat. Given that different choices of the merge base will lead to different\nresults, making that choice arbitrarily is pretty disturbing.\n\ngit, on the other hand, has a strategy called recursive merge that repeatedly\nmerges together the MCAs, and then uses that merged MCA as the basis for\ncomputing the diffs to A and B on which the final merge will be based. And\nhg has a new strategy called bid\nmerge that\nis willing to make different choices as to the GCA to use on a file by file\nbasis.\n\nNone of these approaches amount to principled solutions, and while they work\nbetter in some cases and worse in others, they all sometimes lead to bad\nresults. It’s tempting to look for a way out of this conundrum altogether, by\navoiding the possibility of criss cross merges in the first place.\n\nAvoiding the criss-cross merge\n\nFor those who haven’t read my previous posts about how Iron approaches\nmerges,\nI’ll describe it briefly here. Iron organizes its branches into a hierarchy:\nevery repository has a root feature, and that feature can have children, and\nthose can have children as well. Thus, our main repository, called Jane, has a\nroot feature called jane, and one can develop changes to jane in child\nfeatures, such as jane/Core.Applicative or jane/quickcheck.\n\nCritically, the merging of features is constrained. Note that in Iron, every\nfeature is defined by its base and tip revision, where the diff between\nthose two revisions is effectively the contents of the feature. Here are some of\nthe key operations allowed on features.\n\n\n  fe release, moves changes from a child feature into a parent. This can\nonly be done once the child feature is fully merged with its parent, and has\nthe effect of setting the tip of parent to be the tip of the child, and\ntypically deleting the child.\n\n\nAs an example, if the jane/quickcheck feature is based at the current tip of\njane (and is fully reviewed, and all its tests pass), then calling\nfe release jane/quickcheck will move the tip of jane forward to be equal to\nthe tip of jane/quickcheck, and will delete jane/quickcheck.\n\n\n  fe rebase lets you merge a feature with its parents, effectively pulling\nchanges from a parent feature into a child. This has the effect of changing\nthe base of the feature to be the tip of its parent, and the tip of the\nfeature to be the result of the merge.\n\n\nSo, if other features have been released into jane since the\njane/Core.Applicative feature was created, then the base of\njane/Core.Applicative will no longer be the tip of jane. Calling\nfe rebase jane/Core.Applicative will merge the tip of jane/Core.Applicative\nwith the tip of jane, and will set the base of jane/Core.Applicative to the\ntip of jane.\n\n\n  fe rename, which in addition to allowing you to simply change the name of\na feature, also lets you introduce a parent-child relationship between\nfeatures that didn’t previously have one. e.g., calling\nfe rename jane/Core.Applicative   jane/quickcheck/Core.Applicative causes\nthe Core.Applicative feature to become a child of, and so be able to\ndepend on the changes in, thequickcheck feature.\n\n\nAll of these operations are implemented against a single, centralized server\nwhich keeps track of the state of all our features. This centralization lets\nIron enforce some useful invariants along the way, critically, that the GCA of a\nfeature and its parent is well defined, and is equal to the base of the feature.\nThis simple property turns out to outlaw criss-cross merges, which avoids all of\nthe mess we described earlier.\n\nThe happy outcome turns out to depend critically on the fact that we built a\ncentral server that could enforce the invariants in question, or, more\nprecisely, that we built a consistent service\n\ndiscovered by chance, the existence of the central server is key to enforcing\nthe necessary invariant. In particular, the scenario of two different users\nconcurrently releasing into the same feature or rebasing the same feature simply\nisn’t possible when there’s a centralized monitor determining who goes first.\n\nIn retrospect, this shouldn’t be too surprising. The criss-cross merge is really\na result of concurrency, and the idea that introducing a lock (which is what a\ncentralized server does for you) can be used to exclude unwanted concurrent\nexecutions in a distributed systems should surprise no one.\n\nIn the end, you can trace it all back to the CAP\ntheorem: If you\nwant progress while partitioned, you need to give up on consistency in some way.\nAnd criss cross merges are caused by a kind of inconsistency.\n\nCentralization obviously has downsides, but I think Iron picks a nice point\nalong the spectrum here: writing code is totally doable while disconnected, but\noperations like rebase and release that affect how information is shared between\nfeatures requires you to be connected. I think it’s a small price to pay to\nnever have to deal with a criss-cross merge.\n",
        "url"      : "https://blog.janestreet.com/centralizing-distributed-version-control-revisited/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Making making better",
        "date"     : "January 31, 2015",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 3,
        "content"  : "We spend a lot of time and effort on training new people, and it never stops for\nlong. Right now our winter-intern class is ending; in five months we’ll have a\nslew of new interns to get up to speed, and a few months after that we’ll have\nan incoming class of new hires.\n\nA big part of our new-hire training is OCaml Bootcamp, a month-long immersion\nprogram for non-dev hires (mostly trading, but some from other areas like\nresearch, systems, and accounting). We don’t think everyone at Jane Street\nshould be a full-on software developer, but writing code is such a useful way to\nget things done that it’s worth teaching the basics to a broad set of people.\n\nTeaching programming, especially to people who are not planning on becoming\nsoftware developers, is an opportunity to reflect on how unnecessarily hard\nprogramming can be. There’s a huge learning curve as you struggle to learn your\nway around the Unix command-line, or figure out the key commands for controlling\na 1970’s era text editor like Emacs or Vi, or puzzle through the semantics of a\ndistributed version control system like Mercurial. And all of that is before you\neven start writing code!\n\nTo me, this serves as a reminder of the importance of good tools. The quality of\nyour tools can increase or decrease the steepness of the learning curve, and\nthey also affect your day-to-day efficiency after you’ve climbed up that hill.\n\nTools are easy to undervalue. Most of our development time is spent, as it\nshould be, on our first order problems – writing the code that powers the\nsystems that let us trade. And the connection between better tools and better\ntrading can seem tenuous.\n\nBut really it’s not tenuous at all. If you spend all your time on first order\nproblems, you’ll discover you’re not solving them as fast as you should be.\nGetting things done effectively requires optimizing your own productivity, and\nto do that, you have to spend some time sharpening your tools.\n\nAnd we’ve done a fair bit of sharpening. One recent example\nis Iron,\na new\ncode review\nand release management system that we\nstarted using last summer. Last year, we also rolled out a new build system\ncalled Jenga, which greatly simplified\nand sped up the process of compiling our code. Plus, we switched to\na new version of OCaml, which includes a big set\nof improvements, some of which\nwere specifically aimed at improving our development process [7]. And we\nfunded some former interns to\nimprove Merlin, a fantastic tool\nthat provides IDE-like features like context-sensitive autocompletion in a way\nthat can be easily integrated into multiple editors.\n\nJane Street is a pretty small shop – we have fewer than 65 full time developers\n– but even at our modest scale, spending time on tools is a big win. But it’s\nreally about more than dev tools. Thinking about how to make the people around\nyou more effective informs how you work more generally, changing how you design\nlibraries, how you manage services, and how (and whether!) you write\ndocumentation.\n\nAnd in addition to making for a more effective organization, it’s also a more\npleasant way to live.\n",
        "url"      : "https://blog.janestreet.com/making-making-better/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "13 Virtues",
        "date"     : "January 2, 2015",
        "authorId" : "dpowers",
        "author"   : "David Powers",
        "tags"     : [],
        "minsToRead" : 2,
        "content"  : "Very early on in his life, while on lengthy voyage from London to Philadelphia,\nBen Franklin created a system of thirteen virtues to live his life by. He spent\nthe remainder of his days giving special focus to one virtue per week in a 13\nweek cycle, as well as noting the virtues he failed to live up to at the end of\neach day.\n\nOver time he credited the system with making him more productive and more\nfulfilled.\n\nMy aspirations are not so lofty, but in the spirit of the new year, I present\nBen’s thirteen virtues as they relate to code review and discussion. Nothing\nhere is meant to be taken as gospel, but together they give me a path towards\nthe type of collaboration we value at Jane Street.\n\nMy simple hope is to waste less time, produce better code, and have fewer\narguments over the next 12 months than I did over the last.\n\nTemperance\n\nReview thoroughly, but not to the point of exhaustion. Be mindful of the limits\nof review.\n\nSilence\n\nSay only things that clearly benefit the code; avoid trifling comments and\ntangents.\n\nOrder\n\nCreate the structure (files, modules and types) necessary to give every concept\na clear place. Be suspicious of catchall modules.\n\nResolution\n\nRespond to feedback and change requests quickly. Momentum is important.\n\nFrugality\n\nDon’t waste people’s time with frivolous review, careless comments, or code that\nisn’t ready for review. Attention is expensive.\n\nIndustry\n\nPrefer to respond with working code over additional commentary. Focus review on\nimmediately productive outcomes instead of uncertain concerns.\n\nSincerity\n\nCome to discussions with an innocent mind. Engage in code review with the clear\ngoal of helping.\n\nJustice\n\nWeigh code decisions on the evidence at hand today, and not on personal\npreferences, prejudices, or obsolete past constraints.\n\nModeration\n\nAvoid extremes in both style and approach. Incorporate strong views slowly.\n\nCleanliness\n\nSpend time to lay out code in a clear and visually pleasing way. When reviewing,\nleave the code neater than you found it.\n\nTranquility\n\nDon’t become impassioned or incensed over trifles. Engage in all conversation\nwith an open balanced tone and a sense of patience.\n\nChastity\n\nProliferate new ideas through the code base cautiously and be aware that even a\ngood idea may not work in all places.\n\nHumility\n\nTake a modest view of your own contributions and feedback. Be unpretentious and\nrespectful in your comments. Accept that you may be wrong.\n",
        "url"      : "https://blog.janestreet.com/13-virtues/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Inspecting the Environment of a Running Process",
        "date"     : "December 1, 2014",
        "authorId" : "cperl",
        "author"   : "Chris Perl",
        "tags"     : [],
        "minsToRead" : 2,
        "content"  : "Sometimes its useful to be able see the values of environment variables in\nrunning processes. We can use the following test program to see how well we can\naccomplish this:\n\n#define _GNU_SOURCE\n#include &lt;unistd.h&gt;\n#include &lt;stdio.h&gt;\n#include &lt;stdlib.h&gt;\n#include &lt;string.h&gt;\n\nint main(int argc, char **argv)\n{\n        int n;\n        char *envstr;\n        while((n = scanf(\"%as\", &envstr)) != EOF) {\n                putenv(envstr);\n        }\n        return 0;\n}\n\n\n\nThis program just reads strings from stdin and then basically passes them on to\nputenv(3) so we have any easy way to modify our environment.\n\nNow, lets run it with env -i to reset the environment to something pretty\nsparse:\n\n[cperl@localhost ~]$ gcc -Wall t.c -o t\n[cperl@localhost ~]$ env -i FOO=bar ./t\n\n\n\nFirst, lets see what we can get out of /proc/{pid}/environ, as googling for\nthis problem will undoubtedly point you in this direction (including ps eww\nwhich reads /proc/{pid}/environ):\n\n[cperl@localhost ~]$ cat /proc/$(pgrep -x t)/environ | xargs -r0 -n1 echo\nFOO=bar\n\n\n\nGreat, so that looks like its our answer!\n\nUnfortunately, /proc/{pid}/environ only reflects the environment of the\nprocess when it started and does not reflect any calls that process might have\nmade to putenv(3) or setenv(3) (you can experiment with the above program\nsubstituting in setenv(3) for putenv(3) and playing with overwrite to see\nwhat you get).\n\nWe can see that if we feed some data into our program, causing calls to\nputenv(3):\n\n[cperl@localhost ~]$ env -i FOO=bar ./t\nBAR=baz\n\n\n\nAnd then check /proc/{pid}/environ again:\n\n[cperl@localhost ~]$ cat /proc/$(pgrep -x t)/environ | xargs -r0 -n1 echo\nFOO=bar\n\n\n\nHowever, we can verify the data is really there if we attach with gdb and\niterate over the environ(7) array directly:\n\n[cperl@localhost ~]$ gdb ./t $(pgrep -x t)\n...\n(gdb) set $i = 0\n(gdb) while (environ[$i] != 0)\n &gt;print environ[$i++]\n &gt;end\n$1 = 0x7fffc8e42fec \"FOO=bar\"\n$2 = 0x12d4080 \"BAR=baz\"\n\n\n\nUnfortunately, I’m not aware of any other way to get this “dynamic environment\ninfo” (except for other ptrace based solutions). Obviously attaching to\nproduction processes with gdb (or ptrace in general) isn’t a great idea.\nMost of the time you’ll probably be fine inspecting /proc/{pid}/environ and\nverifying (via source code inspection) that that process you care about doesn’t\nmake any calls to putenv(3) or setenv(3) for the variables whose values you\nare interested in.\n\nIf you have any better ideas about how to get this information, please share in\nthe comments!\n",
        "url"      : "https://blog.janestreet.com/inspecting-the-environment-of-a-running-process/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "How to choose a teaching language",
        "date"     : "November 17, 2014",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 4,
        "content"  : "If you were teaching a programming course, what language would you teach it in?\n\nI like this question because it has any number of good answers, each quite\ndifferent from the other, and each emblematic of a different approach to what\nprogramming is about.\n\nThe first formal programming class I took was COS\n217 at Princeton,\ntaught by the excellent (and at the time, I thought, terrifying) Anne\nRogers. The course was (and is) taught in\nC, and the intellectual approach of the class was to start from the machine.\nInstead of just learning to program in C, we learned about how the machines we\nwere programming to worked. That was where I first encountered instruction\ncounters, stack frames, registers and the memory hierarchy. It was exhilarating.\n\nWhere C encourages you to start with the machine, Scheme wants you to start\nat the mathematical foundations of computation. You don’t need to know what the\nlambda caluclus is to appreciate Scheme’s slim core, and the way in which you\ncan build a rich and vibrant computational world on top of it. That core is\nexpressive enough that it makes it natural to introduce ideas that come up in\nmultiple different languages, including functional and imperative techniques,\nobject orientation, and logic programming.\n\nThe classic course in this vein is MIT’s\n6.001,\nalso known as SICP, The Structure and Interpretation of Computer Programming.\nSadly, 6.001 has been retired at MIT, but the\nbook lives on, and is\na worthwhile read even if you took your last CS course years ago.\n\nMIT replaced SICP with a course based on Python, and this reflects a broader\ntrend. As was highlighted by an informal\nstudy\nby Philip Guo, lots of schools now teach Python, particularly for early\nintroductory courses. I have mixed feelings about this choice. Python is a\nwonderfully friendly language, but that friendliness is bundled together with\nsome problems.\n\nThis was made apparent to me in part by my experience with students who chose to\ncode in their interviews in Python. In many ways, Python is the ideal interview\nlanguage, since its concise and readable syntax makes the constraints of coding\non the whiteboard more bearable. But what I saw was that students who learn\nPython often walk away with a rather rough model of the semantics of the\nlanguage. You might be surprised at what fraction of students who have\nprogrammed extensively in Python can’t guess how Python lists might be\nimplemented, to say nothing of their ability to explain the semantics of\nlanguage features like generators or decorators.\n\nThis isn’t really a knock on Python. After all, there’s something great about a\ntool that lets you get things done without fully understanding how it works. But\nin different ways, both Scheme and C encourage you to understand what’s going on\nfrom the ground up, and there’s a certain pedagogical power in that. All in, I\nthink Python is a great choice for an early introductory course, particularly\none meant for those who aren’t going to end up as computer scientists or\nfull-time programmers. But I’d be leery of using it outside of those\ncircumstances.\n\nA development that I’m personally rather encouraged by is the emergence of\nstatically typed functional languages, ML in particular, as teaching tools.\nOver the last few years, I’ve had the pleasure of visiting\nand lecturing at a number of schools that teach\nusing OCaml or SML, including\nBrown, Cornell, Penn, Princeton, CMU and Harvard.\n\nML has gained ground for good reasons. First, it shares much of Scheme’s elegant\nintellectual foundations, even though its core isn’t quite as charmingly\nminimalistic as Scheme’s. But ML has a wider range than Scheme because it allows\nyou to show students the role of types in programming. Despite that greater\nrange, OCaml and SML are relatively simple languages, which matters even more\nfor teaching than it does for everyday use.\n\nThe only choice I’ve seen a lot of that I can’t reconcile myself to is Java.\nJava is of course a widely used industrial language, but that doesn’t mean it’s\na good teaching language. In my view, a key requirement for a teaching language\nis simplicity, and all the other choices I mentioned are simple in one way or\nanother: C is a minimal layer on top of the machine; Scheme and ML are based on\nsimple mathematical models of computation; and Python is a language that people\nfind simple to use.\n\nJava isn’t simple in any of these ways. It’s not particularly easy to get\nstarted with, as indicated by all the things you need to tell students to ignore\nrather than understand. (Yeah, public static void main, I’m looking at you!)\nNor does it have a simple and transparent execution model like C. And an elegant\ncomputational core calculus like the one at the heart of Scheme and ML is\nnowhere in sight. The only real advantage I see for Java is vocational, and that\ndoesn’t seem to me like argument enough.\n\nThe thing to consider when you’re picking a language to teach is that you’re not\njust picking a bit of infrastructure for your students to program with during\nthe course. You’re picking an intellectual frame through which they will see all\nthe lessons you teach them. And you should pick that frame with care.\n",
        "url"      : "https://blog.janestreet.com/how-to-choose-a-teaching-language/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Interviewing At Jane Street",
        "date"     : "October 24, 2014",
        "authorId" : "dpowers",
        "author"   : "David Powers",
        "tags"     : ["interviewing"],
        "minsToRead" : 9,
        "content"  : "Software Engineering Interviews at Jane Street\n\nWelcome to our version of the “Technical Interviews at insert your company here” post. This topic has been covered by a lot of people already, so I’m going to do my best to not repeat all of the copious advice already out there.\n\nInterviewing, for both the interviewers and the interviewee is just plain old hard work.  We spend a lot of time working on the process, and we know that you have spent years working on your skillset.\n\nEven so, sometimes all that hard work doesn’t work out - even with a great candidate - just due to the the awkwardness of the interview process itself: time is short, the questions are weirdly artificial, and of course people get nervous.  Programming for an audience on a whiteboard, a web browser, or even just on a computer that isn’t yours is a bit like playing the guitar with mittens on. It can put even an accomplished person off their game.\n\nMissing out on good people makes us sad.\n\nThat’s what this post is for. By talking a bit about what we’re looking for, the ways people do poorly, and how we think you might be able to prepare we hope we’ll reduce the mitten handicap - at least a bit.\n\nWhat Do We Look For?\n\nFrom our perspective, the main thing we want to figure out when we interview someone is: are they someone we want to work with?\n\nThat probably seems obvious and maybe even trite, but it’s a point that can easily get lost once an interview is underway. We think of our interviews as little simulations of what it’s like to work with the candidate; and while it may seem like the interview is all about solving the problem, it’s really not. We’re much more interested in learning about  how  you work.\n\nWe are looking for candidates who engage in clear and productive discussions about the problem at hand.  Ideally they quickly find the right technical level for the discussion - not too abstract and not too detailed.  They cleanly describe the most important considerations, and lay out a good plan for dealing with them.  Where there are tradeoffs they note them and explain which side they took and why.  They ask thoughtful questions about parts of the problem that are unclear.  Overall they show tenacity and and curiosity; they want to engage with the problem and with the people they are working with.\n\nWe model our interviews in a way that allows us to simulate, as best we can in an hour, what it’s like to work with someone. That means we don’t particularly value seeing someone write union-find in 15 lines, or care if a candidate knows exactly how an AVL tree is written (though we do care about understanding tree balance at a higher level). Those kinds of things can be found on the internet.\n\nTo that end, we try to avoid algorithm bingo and puzzles with clever “aha” solutions. We prefer more open-ended problems that have several plausible angles of attack and maybe even no single best answer.  They give us more freedom to work together and to see how the candidates’ thinking plays out.\n\nThat sounds nice enough, but it’s a bit high-level and hand-wavy. So here’s a more concrete list of suggestions that follow from our overall approach.\n\nBe nice\n\nThe smartest, most technically clever person in the world won’t get hired at Jane Street if they aren’t courteous and pleasant to talk to.  Most problems at Jane Street are solved with a lot of collaboration and discussion, and the solutions often take days or weeks to work out, incorporating ideas from many many people.  The tenor and quality of those discussions is an essential part of getting to the right answer.\n\nBe clear\n\nAnd by clear, we mean simple and to the point. Use words and examples that get the core idea across to the widest technical audience possible.\n\nAvoid showy, highly technical or deeply obscure terms of art, especially if you don’t fully understand them. In the best case we’ll likely just ask exactly what you meant by “hylomorphism”, which wastes precious time. In the worst case it will become clear that you should have said “metamorphism” instead, which is just embarrassing for everyone involved.\n\nKnow what you don’t know\n\nNaturally we like it when candidates are good at solving the problems we put in front of them. But just as important - perhaps more important - is their ability to reason about their own  level  of understanding.\n\nIn other words, we really like people who can admit when they’re unsure and speak confidently when they have the basis to do so.  Be willing to say, “I don’t know” rather than guessing.  Tell us when you are fairly confident but still a little bit uncertain.  Get used to talking about not just the problem but also your own confidence in individual assumptions and statements.\n\nKnow your language\n\nCode is a wonderful language for communicating complex ideas.  It provides a concise and unambiguous way of expressing them. But, like any foreign language, it takes a lot of time and practice to get really comfortable.\nWe need you to be comfortable with it because we communicate ideas in code a lot.\n\nNow, comfortable doesn’t mean that you need to have memorized the spec for a language (assuming it even has one beyond the reference implementation). It means that you should be able to read and write code in your language of choice without constant access to reference materials for common things, such as:\n\n\n  Standard control structures (loops/if-then/etc.)\n  Function, module, class, type, etc. definitions\n  Common data types like arrays, lists, hash tables/maps/dictionaries\n  Exceptions and other error handling techniques\n  Reasonable code organization\n\n\nThat especially means you shouldn’t pick a language because you think it will make us happy.\n\nWe’d much prefer you use a language that you know well, and can prove your skills with.  Similarly, when picking which features of the language to use pick the ones you understand best.  We’re not going to be impressed with your elegant use of Python decorators if you don’t really understand what they do.\n\nWrite the code you said you would write\n\nWe love it when a plan comes together, but it’s extra hard to watch when a good plan falls apart on execution.  Once you discuss a plan of attack with your interviewer do what you claimed you would do.  Don’t change the plan in the middle or drop it in favor of a better idea without some discussion.  You have a very limited amount of time and executing a decent plan well is better than producing a Frankenstein’s monster of 3 different plans that doesn’t quite come to life.\n\nIf you do get partway through and start to lose faith step back and talk about it. Explain exactly why you are concerned.  If there really is a fatal flaw and you’ve seen it we’ll help you get out of the jam, and we’ll appreciate that you articulated it.  If it’s just not quite perfect we’ll likely encourage you to continue.\n\nReview CS 101\n\nWe’ve hired plenty of successful people who didn’t have a traditional college background in CS and we certainly don’t require a masters or a PhD. That said, we need you to have a solid working knowledge of core computer science concepts, including:\n\n\n  Abstraction layers like functions, objects, and modules\n  Basic algorithms and data structures, including binary search, sorting, hashing, breadth/depth first search, hash tables, binary trees and heaps.\n  Techniques for estimating CPU and memory costs, including big-O notation.\n  Recursion and exhaustive case analysis\n\n\nSo if you can’t for the life of you recall what amortized analysis is and you can’t nimbly bang out the code for a depth-first search it’s probably worth reviewing some of this material.\n\nThink about real computers\n\nDepending on your point of view it’s either a sad or beautiful fact that the most elegant code can only run on top of the giant jangly amalgam of parts and OS glue that is a real computer.  Unfortunately we need programs to actually run so we need people who understand the inner workings of that behemoth.\n\nThis doesn’t mean that we quiz every candidate about deep kernel internals, or the differences between SDRAM vs SGRAM. But for some jobs in systems development we do expect a fair amount of detailed knowledge, and in general it’s a big plus if you can take into account things like cache effects, IO patterns, memory representations, and the capabilities of real CPUs.\n\nWhat We Don’t Look For\n\nLot’s of things.  But there are a few things that pop up time and again that people worry we are looking for that bear mentioning.\n\nWe aren’t looking for you to finish.  Our questions are often designed to be open ended enough that even the best people we’ve seen couldn’t answer them fully in the time allotted.  We want to keep the conversation going to learn everything we can, and we don’t expect that you’ll answer everything 100% perfectly.\n\nWe also don’t expect you to give the perfect answer or to deliver bug free code.  Coding in front of people is hard.  Coding with a time limit is hard.  Coding, in general, is hard.\n\nWe don’t ask software engineers to do mental math, or math olympiad questions, or to contemplate logic puzzles about pirate tigers that only tell the truth despite what you might have read online.  Dev interviews are about programming, plain and simple.  That’s what we are hiring you to do and that’s what we expect you to demonstrate.  There are other jobs at Jane Street that do care about mental math and logic puzzles for good reasons.  Just not this one.\n\nWhat Can You Do To Prepare?\n\nThis part is short and sweet.  Practice.\n\nBuild something from scratch and on your own in a language you like.  Don’t stop short.  Build the whole thing.  Make yourself do the parts that are hard for you and don’t accept half-finished code just because you think you know how you would finish it.  Dive deep into the layers of the libraries and language that you built it upon, and understand how they work too.\n\nWhen you think you are done show it to the smartest people you know, get feedback, tear it down and build it again with what you’ve learned.\n\nRepeat with a new problem.\n\nWe are looking for people to build real things with us, and practice really does make perfect.\n\n",
        "url"      : "https://blog.janestreet.com/interviewing-at-jane-street/",
        "image"    : null,
        "topic"    :  ["technology","interviewing"] ,
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "What the interns have wrought: RPC_parallel and Core_profiler",
        "date"     : "October 16, 2014",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : ["internship"],
        "minsToRead" : 6,
        "content"  : "We’re in the midst of intern hiring season, and so we get a lot of questions\nabout what it’s like to be an intern at Jane Street. One of the things people\nmost want to know is what kind of projects they might work on as an intern.\n\nThat’s of course hard to answer precisely, since we don’t yet know what next\nsummer’s intern projects are going to be. But you can get a sense by seeing some\nof what interns have done in the past. To that end, I thought I’d describe a\ncouple of intern projects that were done this past summer.\n\nRpc_parallel\n\nRpc_parallel is a new library written by Todd Lubin (who will be returning to\nJane Street full-time next fall), aimed at making parallel programming in OCaml\nsimpler and safer.\n\nWriting programs that take advantage of multiple cores, and even of multiple\ncomputers, is of course a necessity. In OCaml, such parallelism is typically\nachieved by writing multiple single-threaded programs that use message passing\nto coordinate.\n\nWe have lots of parallel message-passing systems, but while these share a lot of\ninfrastructure (notably, the\nAsync_rpc\nlibrary), we’ve never really settled on a lightweight way to construct complete\nparallel programs. Instead, each such program tends to have its own little\nprocess coordination framework.\n\nA while back, we wrote a library called Async_parallel,\n(described here). Async_parallel does a lot of things\nright – in particular, it makes it easy to spin up and manage new processes and\ndistribute work between them.\n\nBut Async_parallel has a problem. It is based on OCaml’s marshal facility,\nwhich allows you to serialize arbitrary data, including functions, between\nprocesses. This sounds great, but it has a dark side: marshalling arbitrary\nfunctions turns out to be error prone. In particular, in order to send a\nfunction over the wire, you also have to send a copy of any data that that\nfunction implicitly depends on.\n\nUnfortunately, it’s hard to know what kind of data is hidden behind a function,\nwhich can cause a few different forms of trouble: it might send much more data\nthan you expect, it might fail unexpectedly if it hits one of the forms of data\nthat can’t be marshalled, or it might lead to crazy and hard-to-predict behavior\nif some of the data required by the function is meant to be mutable shared\nstate.\n\nInstead, we wanted a library that was more typeful and constrained in terms of\nwhat was sent over the wire. This pushed us towards a design where, at the cost\nof some extra verbosity, we explicitly declare the types of data that is sent.\nIn exchange, we get a system that is easier to understand and debug.\n\nOne of the great things about Rpc_parallel is how fast it came together. Todd\ngot it into a sufficiently good state by the middle of the summer that he was\nable to use it for his other projects (interns typically have at least two major\nprojects over the course of the summer).\n\nRpc_parallel also benefitted from some world-hopping collaboration. Interns\nspend at least a week in an office other than their home office, and Todd ended\nup visiting Hong Kong. While there, he ended up spending a lot of time talking\nand working with Chengqi Song, who had a lot of experience with\nAsync_parallel. Out of those discussions came a complete redesign and rewrite\nof the library, factoring out the core primitives for coordinating across\nmultiple processes, and making the overall library simpler and more general.\n\nBy the end of the summer, a few other people picked up and starting using it for\nother projects, and last week, it was released open source, so you can take a\nlook at it yourself on github.\n\nCore_profiler\n\nProfiling is surprisingly hard, and as such it’s perhaps unsurprising that there\nare lots of ways of doing it. If you want to understand the cost of an\nindividual operation, you might want a micro-benchmarking library like\nHaskell’s Criterion, or our\nown Core_bench. If you’re trying\nto understand properties of a whole application, like which lines of code it’s\nspending its time on, or how many cache-misses are occurring, you might want to\nuse something like Linux’s perf tools, which use CPU-level counters to\nefficiently gather profiling statistics from your program.\n\nAnother useful technique is for the programmer to modify the source to add\nexplicit probes that keep track of when certain program points are reached, and\nwrite out that information to a log that can be analyzed after the fact.\n\nDaniel Richman (who will be returning for an internship next summer) worked\nalong with Roshan James (formerly an intern himself, now fulltime) on a library\ncalled Core_profiler, which aims to make the use of such probes easy and\nefficient. Efficiency matters quite a bit, because if logging a probe takes more\ntime than the thing you’re trying to measure, you basically can’t extract\nreliable data. Keeping the overhead small, therefore, is a must.\n\nAccordingly, a lot of Daniel’s time was spent thinking very carefully about how\nto write the probes in a way that would only minimally disrupt the execution of\nthe program. He started a simple but less efficient library, called\nStats_reporting, which took about 60ns and two words of allocation per probe,\nand started machining it down from there.\n\nThe first step was to avoid all allocation, which meant we could no longer use\nbin-prot, our standard binary serialization technique, since bin-prot\nrequires that you allocate an OCaml object representing the data to be\nserialized. So they moved to using an internal library called Protogen for\ngenerating zero-allocation serialization code.\n\nThat brought us down to about 30ns, and zero allocations. We then decided to try\nout writing our own hand-rolled binary protocol, so we could have a\nyet-more-closely optimized binary layout. That brought us down to about 20-25ns.\nThe next hack was to customize the binary format yet more by packing multiple\nvalues into a single, 63-bit OCaml integer. That, plus some cleverness to make\nsure that writes were word-aligned, brought the cost of a simple probe down to\n12ns. In addition to the design changes, Daniel also spent a lot of time\ncarefully examining the assembly code, to make sure that there were no surprises\nfrom the code generator.\n\nWe’re pretty happy with the end result. We think that Core_profiler probes are\nnow a good bit cheaper than the dynamic probes you can insert using a system\nlike DTrace, and are in any case the best\ntool we have for tracking the performance of even relatively quick code-paths.\nAnd, as a nice bonus to all this optimization, the offline processing got about\ntwice as fast as a side effect of the runtime improvements.\n\n\n\nThere’s more to be said about both of these projects, and about the many other\nprojects that were done this summer. If you’re interested in applying, you can\ndo so here:\n\nhttp://janestreet.com\n\nYou don’t need to know anything about finance or functional programming to\napply. But if you come, you’ll learn a ton about both by the time you’re done.\n",
        "url"      : "https://blog.janestreet.com/what-the-interns-have-wrought-rpc_parallel-and-core_profiler/",
        "image"    : null,
        "topic"    :  ["technology","internship"] ,
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "What is gained and lost with 63-bit integers?",
        "date"     : "September 29, 2014",
        "authorId" : "vbrankov",
        "author"   : "Vladimir Brankov",
        "tags"     : ["compiler","ocaml","performance","speed"],
        "minsToRead" : 3,
        "content"  : "Almost every programming language uses 64-bit integers on typical modern Intel\nmachines. OCaml uses a special 63-bit representation. How does it affect OCaml?\n\nOCaml int memory representation\n\nMost of OCaml’s types are in memory represented as a header followed by data.\nThe header is a 64-bit integer containing the length of the data and a tag. Tag\nis a rough classification of the type. The only OCaml’s types which differ from\nthis are ints and sometimes floats.\n\nFloats normally have header and data, data being the value of the float itself.\nThis representation is called “boxed”. If a record’s field is float, record’s\ndata will actually contain the pointer to the float data. The only exceptions\nare records with only floats and float arrays, whose data instead of pointers\ncontain the values of floats. This representation is called “unboxed”.\n\nValues of type int are never stored as header and data (boxed). Int x is\nstored as (x &lt;&lt; 1) | 1, where &lt;&lt; is left shift and | is bitwise or, hence\nits least significant bit is always set. Pointers are word aligned, so they will\nnever have this bit set, hence how ints and pointers are discerned. It is\nassumed that much of typical data is integers, so this is done to significantly\nimprove performance:\n\n\n  there’s no need to dereference a pointer when getting an int\n  no memory allocation is needed when creating ints\n  less work for the garbage collector\n  less memory fragmentation\n  no memory is needed for int headers\n\n\nDistinguishing whether a value is int or pointer is as simple as testing x &\n1, so this feature doesn’t slow down garbage collector, polymorphic hash,\npolymorphic compare and whatever else structurally inspects data. One should\nnote that this doesn’t apply to the types int32 and int64, which are always\nboxed.\n\nPenalty\n\nHaving the extra bit comes with a price – arithmetic operations are more\ncomplicated. For example\n\n\n  x + y is translated to CPU instructions x + y - 1\n  x * y is translated to CPU instructions (x &gt;&gt; 1) * (y - 1) + 1\n  x / y is translated to CPU instructions (((x &gt;&gt; 1) / (y &gt;&gt; 1)) &lt;&lt; 1) + 1\n  x lsl y is translated to CPU instructions ((x - 1) &lt;&lt; (y &gt;&gt; 1)) + 1\n\n\nSometimes this penalty is small or nonexistent. For instance there’s no need to\nfix the bit in x + y - z. Only one bit fixing is needed for all five\nadditions in x + y + z + w + u + v.\n\nAnother help is the Intel CPU instruction LEA, which can compute the sum of\nthree integers with a single instruction, like x + y - 1. Unfortunately,\nLEA became very slow in the recent generations of CPUs. Intel doesn’t suggest\nthis will change.\n\nThis benchmark (test.ml) tries to estimate the difference in the performance.\nThe results from Sandy Bridge show about 2 times speed difference in arithmetic\noperations. Assembly can be examined by compiling using “ocamlopt -S test.ml”.\n\nspeed(ns)       63-bit   64-bit   slowdown\nadd independent 0.327588 0.121502 2.696156\nadd   dependent 0.328160 0.169375 1.937477\nmul independent 0.614824 0.328060 1.874120\nmul   dependent 1.094343 0.328872 3.327565\nlsl independent 0.359828 0.166088 2.166489\nlsl   dependent 0.762251 0.177468 4.295151\nxor independent 0.249350 0.122900 2.028886\nxor   dependent 0.404255 0.170380 2.372668\n\n\n\nAgner’s instruction tables show that the difference is even bigger with later\ngenerations of CPUs. For instance, Haswell can do four integer adds per cycle\nversus one LEA.\n\nConclusion\n\nThe benefits of unboxed ints are amazing. On the other hand, arithmetic\noperations are significantly slower. How much do arithmetic operations affect an\naverage program? Could we have a solution which would keep ints unboxed but have\nfast arithmetic operations?\n",
        "url"      : "https://blog.janestreet.com/what-is-gained-and-lost-with-63-bit-integers/",
        "image"    : null,
        "topic"    :  ["technology","compiler","ocaml","performance","speed"] ,
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Clearly Failing",
        "date"     : "August 23, 2014",
        "authorId" : "dpowers",
        "author"   : "David Powers",
        "tags"     : [],
        "minsToRead" : 12,
        "content"  : "The Parable Of The Perfect Connection\n\nEvery programmer in the Intertube connected era eventually has to write, or at\nleast use, an API for a network service – something like a database, a message\nqueue, or web service. And, each and every one of them begins with the\nenthusiasm of the recently inducted as they realize that they can reach out\ntheir hand and control something else. And, each and every one of them\nexperiences that moment of frustration and anger when they realize that their\nbuddy out in cyberspace is a bit of a flake.\n\nNow, we aren’t talking about a seriously unreliable friend. In fact, your buddy\nisn’t really unreliable at all. He’s there 99.9% of the time, and even when\nhe’s out for a quick coffee break he tends to come back quickly. Besides, you\ndon’t have any real control over him. He’s maintained by some other people in a\nbunker far far away. Those people are confusing, hard to reach, and don’t seem\nto care about your problems. So, you do what countless programmers have done in\nthe past…\n\nYou write a loop.\n\nlet rec connect_until_success host_and_port =\n  connect host_and_port\n  &gt;&gt;= function\n  | Ok t -&gt; t\n  | Error _ -&gt;\n    after (sec 5.)\n    &gt;&gt;= fun () -&gt;\n    connect_until_success host_and_port\n\n\n\nBecause you are feeling helpful you enshrine the loop in an API to help other\npeople, because, after all, your buddy is pretty reliable, and it would be a\nshame if other people had to deal with all the nasty complexity that you’ve just\nprogrammed away.\n\nThere are a lot of classic twists and variations on this core storyline:\n\n\n  \n    count the number of failures and give up after x tries (x is usually 3 or 1)\n  \n  \n    back off exponentially so you don’t “hammer” the service\n  \n  \n    don’t wait at all and actually hammer the service in a tight loop because\nlatency is important\n  \n  \n    log the error, because someone will look at the logs carefully. Then retry.\n  \n  \n    keep careful track of the time of the last failure, and always retry, unless\nthe last retry was “recent”, because one blip makes sense but not two.\n  \n  \n    return an elaborate type that encompasses all possible failure modes,\nincluding the fact that we are retrying. Maybe deliver that information in a\nside channel stream of updates.\n  \n  \n    forget giving people a connect method at all. Just give them a query\nmethod and handle the pesky connection details away from prying eyes. You\nget bonus points if the API doesn’t look like you can ever fail.\n  \n\n\nHidden Failure Is Still Just Failure\n\nSadly, the problem isn’t in the cleverness of the technical wizardry you use to\ncover up for your buddy, it’s the fact that covering up failure is just another\nform of failing.\n\nThe connection failed. Not telling the world outside of your API is like hiding\na bad grade from your parents. They might not catch you once or twice, but you\nstill got the bad grade, and eventually they are going to notice that something\nis very very wrong – likely after things have really gone off the rails.\n\nWhich leads us to three useful principles of failure that apply to self-healing\nnetwork connections, and most other failure besides.\n\nFail Quickly, Clearly, and Cleanly\n\nWhen you design an API, or a system, or even a big complex collection of\nsystems, and you think about how it should fail, make sure that the failure is:\n\n\n  \n    Quick: Taking too long to fail is a cardinal sin. Don’t retry a thousand\ntimes, don’t get an hour deep into a computation only to realize that one of\nthe config parameters is bad, and don’t forget to add a timeout when the\nother side might never respond. The sooner you can tell the outside world\nthat you have failed the sooner it can react.\n  \n  \n    Clear: Make sure that your failure behavior is clear, well documented,\nand can’t be missed in a decently written program. It should be obvious from\na read of the API and shouldn’t require a dip into the underlying code to\nunderstand. Beyond that, don’t mumble when you fail (I’m looking at you\nerrno in C). Similarly, don’t go on about all the little nuances surrounding\nyour failure with a 20 case variant response. Most API consumers only care\nabout the binary state of failure in the code. The details are generally\nuninteresting outside of debug logs and human readable messages.\n  \n  \n    Clean: Clean up anything and everything you can after you fail, as\naggressively as you can. That means close your file descriptors, free your\nmemory, kill your child process. Work harder than normal to make the cleanup\nportion of your code simple and obviously correct. But still remember to be\nquick. Do your work after you tell everyone that you have failed if there\nis any chance that you won’t succeed. Don’t be that function/program/system\nthat never responds again because it hung trying to clean up before it\nreported the error.\n  \n\n\nHow Should It Look?\n\nSomething like the following API, comments and all.\n\nThis makes heavy use of some\nnice\nthings from our\npublicly released libraries. If you aren’t already familiar with them you can\ntake a deeper look here.\n\nIf you want the TLDR version, you really only need to understand Deferred and\nOr_error to get the gist.\n\nA Deferred is a value that will get filled in at some point in the future\n(these are sometimes called\npromises), and when you\nread it here it just means that the function doesn’t return immediately –\nusually because some network communication needs to happen to get the result.\n\nOr_error is a fancy way of saying, “this might work, or it might give you an\nerror”. Returning an Or_error forces the caller to check for an error case in a\nvery clear and explicit way. It’s our standard way in an API to indicate that a\nfunction might not succeed because, unlike a comment about an exception that\nmight be thrown, or a special return value (like NULL), Or_error can’t be\nmissed.\n\nSo, if you see something like:\n\nresponse Or_error.t Deferred.t\n\n\n\nYou can read it as, “this won’t return immediately, and when it does it will\neither be an error, or a response”.\n\ntype t\n\n(** connect to the service, returning t or an error if the connection could not\n    be established. *)\nval connect : ?timeout:Time.Span.t -&gt; ... -&gt; t Or_error.t Deferred.t\n\n(** a simple helper function that calls connect with the original parameters.\n    The passed in t is always closed when reconnect is called.  Multiple calls\n    to reconnect on the same t will result in multiple connections. *)\nval reconnect : t -&gt; t Or_error.t Deferred.t\n\n(** connects to the service and runs the provided function if successful.\n    If the connection fails or [f] raises an Error is returned.  [close] is\n    automatically called on [t] when [f] completes or raises. *)\nval with_t\n  :  ?timeout:Time.Span.t\n  -&gt; ...\n  -&gt; f:(fun t -&gt; 'a Deferred.t)\n  -&gt; 'a Or_error.t Deferred.t\n\n(** If timeout is not given it defaults to a sensible value. *)\nval query : t -&gt; ?timeout -&gt; ... -&gt; response Or_error.t Deferred.t\n\nval query_exn : t -&gt; ?timeout -&gt; ... -&gt; response Deferred.t\n\n(** If timeout is not given it defaults to a sensible value.  The returned\n    reader will be closed when the underlying connection is closed, either by\n    choice or error.  It is a good idea for the update type to express the closed\n    error to differentiate a normal close from an error close.  *)\nval pipe_query\n  :  t\n  -&gt; ?timeout:Time.Span.t\n  -&gt; ...\n  -&gt; update Pipe.Reader.t Or_error.t Deferred.t\n\nval pipe_query_exn : t -&gt; ?timeout -&gt; ... -&gt; update Pipe.Reader.t Deferred.t\n\n(** close is idempotent and may be called many times.  It will never raise or\n    block.  Once close has been called all future queries will return Error\n    immediately.  A query in flight will return error as soon as possible. *)\nval close : t -&gt; unit\n\n(** fulfilled when t is closed for any reason *)\nval closed : t -&gt; unit Deferred.t\n\n(** closed is an error state.  Once a connection is in an error state it will\n    never recover. *)\nval state : t -&gt; unit Or_error.t\n\n\n\nSeriously, Never?\n\nUp until now I’ve been making the case for try once, fail quickly and clearly,\nand I think that much, if not most of the time, it’s the argument that should\nhold. But the world is a complex place. Sometimes things fail, and somebody\nsomewhere has to try again. So where should that happen, and what should we\nconsider when we start talking about retry logic?\n\nHow will this stack?\n\nLoops stack poorly and lead to confusing non-linear behavior. This means that\nyou should usually confine retry logic to a component near the bottom or the top\nof your stack of abstractions. Near the bottom is nice, because, like TCP,\neveryone can rely on the behavior. Near the top is nice because you have the\nmost knowledge of the whole system there and can tune the behavior\nappropriately. Most network service API’s are in the middle somewhere.\n\nCan I opt out?\n\nTCP sits on top of UDP and provides a solid retry mechasnism that works really\nwell for most of the world, but it would be a mistake in design to only expose\nthe TCP stack. If you are going to provide a self-healing connection/query\nsystem as part of your API, make sure to build and expose the low level simple\nAPI too. This lets clients with needs you didn’t anticipate interact in the way\nthat they want.\n\nLove shouldn’t be forever\n\nIt’s more likely to be a mistake to try forever than to retry once, or for a set\nperiod of time. It’s one thing to protect a client against a transient failure,\nbut when the transient error lasts for minutes or hours, it’s probably time to\ngive up.\n\nYour resource usage should be bounded\n\nLoops, especially loops that create and clean up resources, have a tendency to\nconsume more than their fair share. This is especially true when the loop is\ntrying to cover for an error case, where things like resource cleanup might not\nwork entirely as advertised. So, it’s on the writer of a loop to test it heavily\nand to have strong bounds on how much CPU, memory, file handles, bound ports,\netc. a single self-healing connection can take. Getting this right is hard, and\nyou should be nervous about doing it quickly.\n\nHow bad is failure?\n\nIt’s much easier to justify a looping retry if it’s the only thing keeping a\nlarge complex system from crashing completely, and it’s correspondingly harder\nto justify when it covers just one more case that any client needs to deal with\nanyway. For instance, a retry loop on my database connection might cleanly cover\nthe occasional intermitent outage, but there are probably real reasons that the\ndatabase might be out (network failure, bad credentials, maintenance window),\nand my program likely has to handle this case well anyway.\n\nNot all failure is created equal\n\nSome failures justify a retry. Some failures don’t. It’s important in retry\nlogic to avoid big try/with blocks that catch any and every error on the\nassumption that any query or connection will eventually succeed. Retrying\nbecause my connection closed is different than retrying my malformed query.\nSadly you can’t always tell the difference between the two cases, but that\ndoesn’t mean you shouldn’t make an effort.\n\nYou still have to consider failure\n\nYou can use a retry loop to limit errors above a certain abstraction boundary,\nor to limit the impact of small glitches, but you can’t recover gracefully from\nall of the errors all of the time. When you add a retry loop to your system at\nany level stop to consider what should happen when the error is a real error and\nisn’t transient. Who is going to see it? What should they do about it? What\nstate will clients be in?\n\nIt’s easier to solve a specific problem than a general one\n\nIt’s much easier to come up with retry logic that makes sense for a particular\napplication in a particular environment than it is to come up with retry logic\nthat is generically good for all clients. This should push you to confine retry\nlogic to clients/API’s that have a single well considered role and to keep it\nout of API’s that may be used in many different contexts.\n\nQuick, Clear, and Clean still (mostly) apply\n\nEven when you are considering retry logic, make sure you think about getting\nstuck (quick), getting debug information about your state to the outside world\n(clear), and keeping resource usage bounded (clean).\n",
        "url"      : "https://blog.janestreet.com/clearly-failing/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "The ML Workshop looks fantastic",
        "date"     : "July 31, 2014",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 0,
        "content"  : "I’m a little biased, by being on the steering committee, but this year’s ML\nworkshop looks really interesting. Here’s a link to the program:\n\nhttp://okmij.org/ftp/ML/ML14.html\n\nIt has a bunch of interesting papers, including work on cleaning up and\nsimplifying first-class modules, type-level module aliases, new approaches in\ntype-directed test generation, a couple of papers on implicits (how they’ve\nworked out in Scala, and a proposed approach in OCaml), and more.\n\nThe OCaml users-and-developers\nmeeting is also looking\ngood, though for me there’s a clear favorite among the presentations: OCaml\nLabs’ announcement of version 1 of the OCaml Platform. I think that’s a pretty\nimportant milestone for the language.\n\nAll told, it’s shaping up to be a good ICFP.\n",
        "url"      : "https://blog.janestreet.com/the-ml-workshop-looks-fantastic/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Simple top-down development in OCaml",
        "date"     : "July 18, 2014",
        "authorId" : "ceastlund",
        "author"   : "Carl Eastlund",
        "tags"     : [],
        "minsToRead" : 2,
        "content"  : "Often when writing a new module, I want to write the interface first and save\nthe implementation for later. This lets me use the module as a black box,\nextending the interface as needed to support the rest of the program. When\neverything else is finished, I can fill in the implementation, knowing the full\ninterface I need to support. Of course sometimes the implementation needs to\npush back on the interface – this pattern isn’t an absolute – but it’s certainly\na useful starting point. The trick is getting the program to compile at\nintermediate stages when the implementation hasn’t been filled in.\n\nThe longhand way I’ve used previously is update the .mli file as I go with the\ninterface, and fill in stubs for each definition in the .ml file. For example,\nlet’s say I’m writing a stack module:\n\n(* stack.mli *)\ntype 'a t\nval empty : 'a t\nval push : 'a t -&gt; 'a -&gt; 'a t\n\n(* stack.ml *)\ntype 'a t\nlet empty = failwith \"unimplemented\"\nlet push = failwith \"unimplemented\"\n\n\n\nIf I want to add a new value, I can just add a new stub:\n\n(* stack.mli *)\ntype 'a t\nval empty : 'a t\nval push : 'a t -&gt; 'a -&gt; 'a t\nval pop : 'a t -&gt; ('a t * 'a) option\n\n(* stack.ml *)\ntype 'a t\nlet empty = failwith \"unimplemented\"\nlet push = failwith \"unimplemented\"\nlet pop = failwith \"unimplemented\"\n\n\n\nThis works: I haven’t had to commit to a representation for stacks or an\nimplementation for the operations, but my whole program can use the stack module\nand will still compile. Of course, nothing will run until I get rid of those\nexceptions. On the other hand, this is still more work than I’d like. I have to\nfill in stubs for every definition; in many cases, the stubs themselves are\nlonger to write than their types in the .mli file. As my interface changes, I\nhave to do as much work in the .ml file as I do in the .mli to keep the two in\nsync. It turns out I can do better. By rearranging the files, I can just update\nthe interface as I go:\n\n(* stack_intf.ml *)\nmodule type S = sig\n  type 'a t\n  val empty : 'a t\n  val push : 'a t -&gt; 'a -&gt; 'a t\n  val pop : 'a t -&gt; ('a t * 'a) option\nend\n\n(* stack.mli *)\ninclude Stack_intf.S\n\n(* stack.ml *)\ninclude (val (failwith \"unimplemented\") : Stack_intf.S)\n\n\n\nNow I can just add to stack_intf.ml as I go. Both stack.mli and stack.ml will\ntake on the right module types automatically, without the need to add explicit\nstubs for type or value definitions. Once I’m done filling in the interface, I\ncan paste the module type into stack.mli, remove stack_intf.ml, and start\nfilling in stack.ml with real definitions. But the initial phase is much easier\nfor only having to add exactly what I need.\n",
        "url"      : "https://blog.janestreet.com/simple-top-down-development-in-ocaml/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "What's in a name?",
        "date"     : "July 10, 2014",
        "authorId" : "oshivers",
        "author"   : null,
        "tags"     : [],
        "minsToRead" : 14,
        "content"  : "\n  In the once upon a time days of the First Age of Magic, the prudent sorcerer\nregarded his own true name as his most valued possession but also the greatest\nthreat to his continued good health, for—the stories go—once an enemy, even a\nweak unskilled enemy, learned the sorcerer’s true name, then routine and\nwidely known spells could destroy or enslave even the most powerful. As times\npassed, and we graduated to the Age of Reason and thence to the first and\nsecond industrial revolutions, such notions were discredited. Now it seems\nthat the Wheel has turned full circle (even if there never really was a First\nAge) and we are back to worrying about true names again.\n\n  – True Names, V. Vinge\n\n\nOne of those moments\n\nI was talking to a networking-researcher friend1 one day, and I\nsuddenly remarked on what a large percentage of our conversation had to do with\nnames. He looked at me and said: “Well… yeah. Naming things is one of the\nfundamental things computer scientists do.”\n\nThe light bulb that went off in my head was so blinding, I dropped out of the\nconversation for about twenty or thirty seconds of dead silence while I tried to\ndeal with his statement.\n\nBut, really, I’ve been dealing with it ever since. I had this conversation\naround 1993, and since then, it seems that the concept of names has come up over\nand over in my life working with information architectures.\n\nDistal access and symbol systems\n\nThe best explanation of names I ever got—what they are and why they\nmatter—didn’t come from a programming-languages person, or a networking person.\nIt came from a scientist who worked in Artificial Intelligence: my thesis\nadvisor, Allen Newell2.\n\nNewell and Simon’s Turing award cited them for their work on “symbol systems.”\nNewell once told me that this was just names, and then he explained his\nunderstanding of names: “They provide distal access.” That is, a name is a\nlocal piece of data that stands for some other piece of data, which is\npresumably large and remote. You can now use that small, convenient, local datum\ninstead of the large, remote thing for which it stands. The key act you perform\nwith a name (that is, a symbol) is ship it to that remote location, and get back\nthe chunk of data it named. Newell said the career-making, fundamental “aha”\nexperience of his entire life was realising that computers were not, as was\ntypically held in the 1960’s, “number crunchers.” They were symbol\nprocessors—something much more general. They processed names.\n\nIf you accept Newell’s definition, you suddenly start seeing names everywhere,\nat every scale:\n\n\n  Moving a register id from the instruction-fetch unit on one side of a\nCPU to the register bank on the other;\n  Shipping an address from the CPU to the memory system;\n  Referencing a variable in a tight loop that was bound on entry to the\ncontaining procedure;\n  Sending a host name to a DNS server down the hall;\n  Sending a URL from my web browser to a server on the other side of the\nplanet;\n  Using a ten-kilobyte BitTorrent file to download a multi-gigabyte movie\noff the net.\n\n\nThese are all just names and their associated dereferencing operations. Any time\nyou see a system that has a notion of “cookies” or “handles:” those are just\ndifferent names for names.\n\nAs far as Newell was concerned, this is the purpose served by names, or symbols,\nin all computational systems. Including the one in your head. When he said\n“distal access,” he assumed that you, too, have structures on one side of your\nhead representing what you know about, say, Neil Armstrong, and a smaller\nstructure on the other side of your brain encoding the name “Neil Armstrong,”\nand cognitive mechanisms allowing you to fetch the former given the latter.\n\nTrue names, finance and prudent sorcerers\n\nThe BitTorrent example above is particularly interesting, since it comes with an\nunusual, distributed dereferencing mechanism. A BitTorrent name also has the\nadditional exotic property of being a “true name,” in that it names one and\nonly one piece of data. You don’t need to trust the distributed store: every\nidentifier includes enough information for you to verify that what your\ndereference produces is the thing originally named.\n\nTo invoke my opening quotation from the novel True Names, cultures have always\nattached a kind of magic to names, reflecting an intuitive understanding that\nnames convey power and control; the reason for this is reflected in Newell’s\nsummary of names as being a means of access. According to the book of Genesis,\nfor example, the first act of Man was the assignment of names,3\nsomething which symbolically (there’s that word, again) represents a transfer of\ncontrol over the material world, as it is handed from its divine source over to\nhuman dominion.\n\nShifting from the sacred to the profane, these days I’m working with a bunch of\nfinance people, who are so concerned with the issue that they devote really\nastounding amounts of brainpower and labor to keeping straight the names of\nthings. An example will show why. Financial people not quite as obsessive about\nnames as Jane Street’s programmers have been known to use the name “TWTRQ” to\npurchase sizeable lots of Twitter stock. This is not a very good idea, because\nthat actually gets you shares in the company Tweeter. Which is bankrupt. Twitter\nis traded on the stock exchange as “TWTR,” not “TWTRQ.” Likewise, Comcast has\ntwo different listings on the exchange (CMCSA and CMCSK), which get you\ndifferent kinds of stock.\n\nOh, and the people who work on this stuff at Jane Street call what they do,\n“symbology.” They must have read Newell and Simon’s Turing Award lecture, I\nguess4.\n\nSharing and arrows\n\nAnother fundamental property I’d note that names have—besides ubiquity, and this\nnotion of “distal access”—is that they give you the ability to refer to a thing\nmultiple times. That’s always a hint that maybe you need to be thinking about\nnames. For example, if I have a computation represented by some expression\n〈exp〉, and I want to do exactly one thing to the value it produces—say, I\nwant to add 5 to it—then I simply write 〈exp〉 + 5\n\nNo need to name the value. But if I want to do two things to it, then I need\nto name it, so I can reference it:\n\nlet x = 〈exp〉 in\nprint(x); (* first use  *)\nf(x + 5); (* second use *)\n\n\n\nOne way to view this is to say that a context-free grammar (like the one that\nsays what strings are legal Java programs) turns a string of text into a tree\nstructure, the parse tree. Once you put names into a language, however, your\ntree can now encode a DAG or (with recursive “letrec” scope) a general\ngraph—names let you encode control- and environment-structure loops in your\ntree. When you need to write down something with DAG or cyclic structure, that’s\na hint you need to start thinking about some kind of a language with names in\nit.\n\nSo, I’ve now said the same thing a couple of times. Why not say it again? A name\nis an arrow: a link from arrow tail (reference) to arrow head (binding). (Or as\ncompiler hackers prefer to say, from “use” to “def”) Some people take this arrow\nidea quite literally—for example, the Racket\nlanguage’s development environment will show you these links as actual arrows on\nyour screen when you hover your mouse over a variable.\n\n\nThe art of names\n\nEchoing my networking friend, I have a lot of respect for people who name well.\nName choices inflict specific thought processes on people who use them; bad name\nchoices inflict perverse or misleading thought processes, and make it hard to\nunderstand what’s happening in a system. Good name choices make it easy and\nnatural to do the right thing—like expressive, well-chosen types, they lead you\neffortlessly to the terms you wanted to write.\n\nThis is because names used by humans come with baggage. They fit into a larger\nframework. When you can get that framework right, and stick to it, then the\nnames come easy—not just when they are minted, but when someone is trying to\nrecall the name for some existing thing.\n\nFor example, when I’m using a library, and discover that\n\n\n  you make new hash tables with hashtable_create,\n  but new red/black trees with make_rbtree,\n  and new skip lists with NewSkipList,\n\n\nI cringe. Much, much better to fix on a single lexeme, such as “create”, and use\nthat everywhere in the names of functions that make / create / allocate new\nthings: create_hashtable, create_rbtree, and create_skiplist. Consistently\nconstructing your names from a well-chosen set of such parts means that, once\nclients of your system have seen a couple of representative names, they can more\nor less guess the existence of functions they’ve never even seen, without\nhaving to paw through documentation or stop and look things up.\n\nA good heuristic for naming is Rob Pike’s well-known dictum that the length and\ndetail of a name should be proportional to its scope:\n\n\n  A variable that is referenced across multiple files might do well to have a\nnice, long, descriptive name, such as “credit_card_expiration”.\n  A variable whose scope extends across an entire file, so that its definition\nis separated from its references by multiple screenfuls of text, likewise\nshould have a fairly detailed name—enough detail so that a programmer coming\nacross a reference to the variable doesn’t need to go hunting for its\ndefinition to understand what it references.\n  A variable used within the body of a fifteen-line function can be compressed\na bit, e.g., “cc_exp”.\n  An index variable used in a three-line loop? Something short, like “i”,\nhas the benefit of terseness, which allows the rest of the structure of the\nloop to be more easily perceived.\n\n\nPike and his colleague Peter Weinberger’s paper, “The hideous name,” is an\nextended cris de cœur about the metastasis of conflicting, poorly-designed\nnaming mechanisms that came about in the early 1980’s for email addressing.\nAlthough these problems have largely been resolved since it was written, the\nissues of name-space design raised by the paper never go away, and the\nprinciples and general discussion in the paper remain quite relevant in this\nbroader context.\n\nWhen I see someone agonising over what is just the right name for something he\nis defining, I relax a little bit: I know I’m working with someone who gets\nthings right. Because naming things is one of the fundamental things engineers\ndo.\n\nThe logic of names\n\nOne final remark about names. We use names all our lives, every day, all day. So\nthey seem obvious and not so mysterious. But names are subtle. There is a sense\nin which the λ calculus is nothing more, really, than a theory of: names. Go\nlook at the fundamental, definitional rules of the λ calculus. They are all\nabout manipulating names! Consider, for example, β reduction, which slips some\ntricky renaming steps in behind the scenes as you proceed down into the redex.\nLikewise, the α rule tells you what it is about names that doesn’t matter, and\nwhat it is about names that is of essence.\n\nThe λ calculus was a logician getting the handling of names right. It’s amazing\nto me that this is such a recent step forward for the human race, something that\nhas happened within living memory. And we’ve been working on names formally,\nnot just in the street, for quite a while: a Greek formal-methods friend of\nmine5 gives Aristotle credit for articulating the notion of\n“variables”—that is, names. When it takes over 2300 years to really nail down an\nidea, you figure the subject might be a little deeper than you initially\nsupposed. (And I’m skipping over all the post-Aristotle / pre-Church work done\nby logicians on handling names as they are used in quantified logics.)\n\nAnother sign this is subtle is how many smart people got this wrong in the ’70s\nand ’80s by designing languages with dynamic scoping for their name handling.\n(Not that I’m, uh, naming any names.)\n\nYet another sign is that, in 2014, it’s still a hot research topic! All the\nwork on “macro hygiene” that comes out of the Scheme community? It’s about\nnames. The recent work done by Andy Pitts, Francois Pottier and others on\nnominal logics and nominal types? Nominal logic is just what the name says: a\nsystem whose entire raison d’être is manipulating names. Bob Harper and Derek\nDreyer’s work on module structures? Straightening out names as they are used in\ntype classes and program modules.6\n\nNames have been significant in my own research life. For example, one of the\nmost fun research results I ever had was an algorithm I developed with Mitch\nWand that exploited a surprisingly simple data structure for representing the\narrows that are names. I did some work in grad school on a family of program\nanalyses that featured varying degrees of precision in how they abstracted\nenvironment structure: the key to the whole thing lay in the semantic\nmechanism that manages name spaces and name binding. Two of my top grad students\nhave both done even more recent dissertations on novel, exciting mechanisms for\nreasoning about environments and the names they manage: Dimitrios Vardoulakis\njumped the power of the abstraction up from finite to infinite domains, which is\nnot so easy to do in a finite analysis; Matt Might’s dissertation had no less\nthan three distinct innovations concerning the design and management of abstract\nenvironments: abstract counting, abstract GC and frame-string contours.\n\nBut that’s what’s going on at the frontiers of scientific knowledge. Returning\nto the trenches of designing information systems and just doing my day-to-day\nprogramming: when I worry about handling names right, I tend to stop and bless\nthe name of Church for getting things sorted for me, and am grateful I get to\nwork in a tool for expression directly based on his results. As the man said of\nthe λ calculus, “There may, indeed, be other applications of the system than its\nuse as a logic.”\n\nWell… yeah.\n\n     -Olin\n\n\n\nAcknowledgements\n\nBesides the names already mentioned in this essay, I’m also indebted to Harry\nMairson and Alan Bawden for contributing to my ongoing education on the subject\nof names.\n\nFootnotes\n\n\n  \n    \n      John Wroclawski &#8617;\n    \n    \n      I actually had two thesis advisors: Peter Lee and Allen Newell.\nThis is like winning the lottery two days consecutively. &#8617;\n    \n    \n      Genesis 2:19 “And out of the ground the Lord God formed every beast\nof the field, and every fowl of the air; and brought them unto Adam to see\nwhat he would call them: and whatsoever Adam called every living creature,\nthat was the name thereof.” &#8617;\n    \n    \n      I like this better than the alternate theory, anyway, which\nis that they are all huge fans of The Da Vinci Code. &#8617;\n    \n    \n      Panagiotis Manolios &#8617;\n    \n    \n      …or so I am reliably assured by my friends who have the\nintellectual horsepower to understand their papers. &#8617;\n    \n  \n\n",
        "url"      : "https://blog.janestreet.com/whats-in-a-name/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Inspecting Internal TCP State on Linux",
        "date"     : "July 9, 2014",
        "authorId" : "cperl",
        "author"   : "Chris Perl",
        "tags"     : [],
        "minsToRead" : 3,
        "content"  : "Sometimes it can be useful to inspect the state of a TCP endpoint. Things such\nas the current congestion window, the retransmission timeout (RTO), duplicate\nack threshold, etc. are not reflected in the segments that flow over the wire.\nTherefore, just looking at packet captures can leave you scratching your head as\nto why a TCP connection is behaving a certain way.\n\nUsing the Linux ss utility coupled with crash, its not too difficult to\ninspect some of the internal TCP state for a socket on Linux. Figuring out the\nmeaning of all variables and how they relate to the variables referenced in the\nmany TCP RFCs and papers is another matter, but at least we can get some idea of\nwhat is going on.\n\nFirst, you can ask ss to give you information about, say, NFS sockets in use\non a given client system:\n\n[cperl@localhost ~]$ ss -eipn '( dport = :nfs )'\nState       Recv-Q Send-Q    Local Address:Port    Peer Address:Port \nESTAB       0      0         192.168.1.10:975      192.168.1.200:2049   ino:12453 sk:ffff8802305a0800\n     ts sack cubic wscale:6,7 rto:201 rtt:1.875/0.75 ato:40 cwnd:10 ssthresh:40 send 61.8Mbps rcv_rtt:1.875 rcv_space:1814280\nESTAB       0      0         192.168.1.10:971      192.168.1.201:2049   ino:16576 sk:ffff88022f14d6c0\n     ts sack cubic wscale:6,7 rto:202 rtt:2.125/1.75 ato:40 cwnd:10 ssthresh:405 send 54.5Mbps rcv_rtt:5 rcv_space:3011258\n\n\n\nInternally, ss uses the tcp_diag kernel module to extract extra information\n(this is done via an AF_NETLINK socket).\n\nA lot of interesting TCP state is provided in this output. For example, you can\nsee the current retransmission timeout (“rto”), the current buffer space\navailable for receiving data (“rcv_space”), the congestion control algorithm\n(“cubic”) and you can see what the window scale option for the connection is\n(the number before the comma is the scaling applied to the window offered by the\nremote endpoint and the number after the comma is the scaling the remote\nendpoint will be applying to the window offered by us (i.e. its the Window Scale\noption we sent in our initial SYN). Some of the other variables are interesting\ntoo, but going into details on all of them is beyond the scope of this blog\npost.\n\nIf you’re really, really interested in the kernel’s internal state, you can also\ntake the address of the struct sock that ss gave you (e.g.\nsk:ffff8802305a0800) and inspect it with crash:\n\n[cperl@localhost ~]$ sudo crash -e emacs\n...\n      KERNEL: /usr/lib/debug/lib/modules/2.6.32-431.1.2.0.1.el6.x86_64/vmlinux\n    DUMPFILE: /dev/crash\n        CPUS: 4\n        DATE: Tue Jul  1 15:26:19 2014\n      UPTIME: 1 days, 07:32:48\nLOAD AVERAGE: 0.08, 0.05, 0.01\n       TASKS: 871\n    NODENAME: localhost\n     RELEASE: 2.6.32-431.1.2.0.1.el6.x86_64\n     VERSION: #1 SMP Fri Dec 13 13:06:13 UTC 2013\n     MACHINE: x86_64  (2992 Mhz)\n      MEMORY: 7.9 GB\n         PID: 29732\n     COMMAND: \"crash\"\n        TASK: ffff88013a928080  [THREAD_INFO: ffff88011b548000]\n         CPU: 1\n       STATE: TASK_RUNNING (ACTIVE)\n\ncrash&gt; struct tcp_sock.rcv_nxt,snd_una,reordering ffff8802305a0800\n  rcv_nxt = 3794247234\n  snd_una = 2557966926\n  reordering = 3\n\n\n\nBecause of the way Linux stores the structures in memory, you can just cast the\nstruct sock to a struct tcp_sock. If you leave off the specific members in\nthe “struct” invocation above you can get a recursive dump of all the fields and\nthe structures embedded within (its just too large to be useful in this blog\npost).\n\nIt’s possible you might not be able to get what you want just using crash and\nmay want to turn to a tool like SystemTap to further figure out what is going\non, but this is a decent place to start.\n",
        "url"      : "https://blog.janestreet.com/inspecting-internal-tcp-state-on-linux/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Making “never break the build” scale",
        "date"     : "July 6, 2014",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 6,
        "content"  : "I just stumbled across a post from\nearlier this year by Graydon Hoare, of Rust fame.\nThe post is about what he calls the “Not Rocket Science Rule”, which says that\nyou should automatically maintain a repository that never fails its tests. The\nadvantages of the NRS rule are pretty clear. By ensuring that you never break\nthe build, you shield people from having to deal with bugs that could easily\nhave been caught automatically.\n\nThere are a few ways of implementing the NRS rule. Graydon describes the\nnow-defunct Monotone VCS as a system meant to implement this in a distributed\nway. But you can also write a simple set of scripts that handle it in a\ncentralized manner. To do this, you create a build-bot that evaluates every pull\nrequest, merges (or rebases) it with the latest tip as necessary, and then makes\nsure that it builds cleanly and that all the tests pass before allowing it to\nbecome the new released tip.\n\nGraydon notes that lots of people get this wrong. They have automated test\nsuites, but the test suites are run on code after it’s integrated with the main\nbranch, not before. This is a lot worse, exposing people to buggy code, and,\nmore subtly, because it means that when things break, you have to scramble to\nfigure out who broke the build so you can get them to fix it.\n\nIf the NRS rule is such a good idea, why isn’t it used more often? Part of the\nreason, I suspect, is the lack of good tools. Indeed, when we wanted to do this\nat Jane Street, we ended\nup building our own.\n\nBut I think there’s more to it than that. Another problem with the NRS rule is\nthat it’s not clear how to make it scale. To see why, consider the naive scheme\nGraydon mentions for implementing NRS: for each pull request, you merge it with\nthe tip, compile and build the result, run all the unit tests, and then, if all\ngoes well, bless that as the new tip of the repo. This process is sequential,\ndeciding on a single, linear order in which the merges happen, building and\ntesting each proposed change along the way.\n\nThis means if verifying a pull-request takes m minutes, and you have n pull\nrequests, the release time is going to take at least m * n minutes. At one\npoint, this was a rather serious problem for us. We had lots of people\nconstantly pushing small changes, each of which had to make its way through the\nbuild-bot. And some of these runs end by rejecting the feature, which means that\na single pull request might need to make multiple attempts to get released. To\nmake matters yet worse, a full build of the tree was expensive, taking two hours\nand up.\n\nThe end result is that things would get backed up, so that a change would\nsometimes take 24 hours to get through the build-bot. We knew we needed to fix\nthis, and here are some of the ideas we tried out.\n\n\n  \n    Simple speculation: If you have a high reject rate, you can increase your\nthroughput by evaluating pull requests in parallel. The first one that\nsucceeds becomes the next tip, and all failing requests are rejected back to\ntheir owner. Note that while each failure represents progress, multiple\nsuccesses don’t, since the second successful pull request would still has to\nbe merged with the new tip and tested yet again before it can be integrated.\n  \n  \n    Merging speculation: Simple speculation only helps if you have lots of\nfailed requests that need to be weeded out. If you want to speed things up\nfurther, you can speculate en masse by merging multiple requests together,\nand releasing them if they collectively build and pass all the tests. If the\ngroup fails, you don’t know for sure which of the requests was at fault, so\nyou need to do some individual requests to ensure forward progress.\n  \n  \n    Faster builds: Simply making your build faster helps a lot. We did a lot\nof work in this direction, including writing our own build system,\nJenga that sped our builds up by\nquite a bit. In addition to making from-scratch builds faster, we also\nworked to make incremental builds reliable. This made it possible for\nchanges that only touched a small fraction of the tree to be verified very\nquickly.\n  \n  \n    Make fewer pull requests: This isn’t always possible, or even advisable,\nbut other things being equal, a workflow with fewer proposed changes will\nget through the build-bot faster.\n  \n\n\nInterestingly, a change in our workflow did massively reduce the number of\nrequests made to the build bot, which really did improve matters for us. This\nderived from a change in our approach to code review. We moved from a (slightly\ncrazy, definitely unscaleable)\nsystem\nwhere\nchanges were submitted for integration to the main branch before review, and\nreview of that branch was done later. We moved to a\nsystem\nwhere a feature is kept on a separate branch until it is fully reviewed, and\nonly integrated after.\n\nOne side effect of this change is to batch up the integration requests, so that\nrather than integrate your changes bit by bit, you integrate it en masse when\nthe feature is done. Our main repository went from accepting hundreds of\nrequests a week to just thirty or fourty.\n\nThe above tricks can tweak the constants, but don’t change the asymptotics. If\nthe size of our development team grew by a factor of 10, or 100, these fixes\nnotwithstanding, I would expect our approach to the NRS rule to break down.\n\nA different approach to scaling up is to make the release process hierarchical.\nThis is similar in spirit to the Linux kernel development model. There, there\nare “lieutenants” who are responsible for individual subsystems and who maintain\ntheir own process for deciding which patches to integrate, and making sure via\nreview and testing that the patches are good enough to be migrated upstream.\n\nIron, our new code review and management supports something in this vein\ncalled hierarchical features. Essentially,\nIron allows you to organize your release process as a tree of release processes.\nBy keeping the branching factor small, you reduce the amount of work that each\nrelease process needs to handle, and thus you can push more patches through the\nprocess. Effectively, release processes lower down in the tree batch together\nfeatures, so they can go through the central process in a single round through\nthe build-bot.\n\nWe already take advantage of this at a small scale, both by organizing\nlong-running release processes, and by spinning up smaller release processes\nwhere an ad-hoc group can collaborate on an interconnected set of features, then\nmerging them together to release them as a unit. Throughout, Iron carefully\ntracks the code review that was done.\n\nThe end result is that the not-rocket-science rule requires a bit more rocket\nscience than it seems at first. Like many ideas, it gets trickier at scale.\n",
        "url"      : "https://blog.janestreet.com/making-never-break-the-build-scale/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Notes on Naming",
        "date"     : "June 29, 2014",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 5,
        "content"  : "I’ve been thinking about naming recently, specifically the naming of new\nsystems. It’s tempting to think of naming as trivial, but it really does matter.\nIn a technology driven organization, names are part of how you communicate about\nthe purpose and nature of your systems. And that communication matters more as\nthe number of people and systems grows.\n\nHere are a few rough guidelines to keep in mind.\n\nAvoid minting new names if you can\n\nEvery name that has to be remembered is a drain on peoples’ mental space. As\nsuch, sometimes the best strategy is not to name something at all, but instead\nto piggyback on an existing name.\n\nA good example of this at Jane Street is catalog, an in-house pub/sub system.\nCatalog has many related systems, including the catalog metadata server, the\ncatalog browser, the catalog logger, and the cateye monitoring system.\n\nBy sticking catalog somewhere in the name, the user is given a good starting\npoint for understanding the system and what it’s about. In the case of catalog,\nwe’ve even embedded many of these components as sub-commands of the catalog\ncommand line tool. This is a useful trick, since it makes these pieces easier to\nfind.\n\nWe’ve recently tweaked Core.Command, our command-line parsing library, to make\nit possible to embed completely separate executables within one unified\ncommand-line interface, so that, say, catalog logger could invoke a different\nexecutable than the catalog command-line tool itself, meaning we can get the\nunified discoverability without tying together the release process.\n\nNames should be informative\n\nYour starting point for understanding the nature of a system is often its name.\nThey show up all over the place: documentation, machine names, mailing list\nnames, source directories, config files, emails, and face to face conversations.\nA name is an opportunity to help people in all those contexts understand what\nyour system does, and you should make the most of that chance.\n\nThere are two basic ways to pack information into a name: making it\ndescriptive, and making it mnemonic.\n\nDescriptive names are ones that directly say what the system is for. For\nexample, with a little Jane Street context, it’s not too hard to guess that the\nactivity-cache is a database that caches a view of our trading activity, or\nthat the entitlement-monitor is a tool for monitoring the entitlements that\nusers have for seeing marketdata.\n\nMnemonic names, on the other hand, are names that give you a mental hook that\nhelps you remember what a system does, without being enough to tell you the\nmeaning. Mnemonic names are less informative, but they’re often shorter, and can\ncarry a more distinct identity than a blandly descriptive one.\n\nA sense of identity can be useful when you need to distinguish systems with\nsimilar purposes. We’ve had at least 4 different systems that could have been\ncalled alert-monitor. Rather than calling them all alert-monitor, or, even\nworse, alert-monitor{1..4}, we gave them mnemonic names that carry distinct\nidentites, like watcher, eye, cateye and oculus.\n\nLess common systems should have more descriptive names\n\nWhen choosing an identifier in code, it’s good practice to pick a longer, more\ndescriptive name when naming a more rarely used thing. You don’t want to waste\nprecious mental space on remembering the meaning of something you interact with\nrarely. Conversely, short names that are less informative make the most sense\nfor very common operations.\n\nThe same principle applies to naming systems. In the systems case, however, it’s\na little trickier to think about what it means for something to be “frequently\nused”. In particular, the audience of people you’re communicating to is more\ndiverse with systems than with code, where your audience is the other developers\nworking on the system. In the systems case, you should think not just about the\nprimary users of your system, but more broadly about the devs, admins, support\nfolk, and really anyone else who interacts incidentally with the system.\n\nHere’s an example. Years, ago, we released a system called gord, which is\nreponsible for slurping up trading activity from our trading systems and\ndistributing it to other applications. I don’t think it’s a particularly\ninformative name, but it comes up all the time in people’s day to day work life,\nwhich means that the information in the name isn’t as important as the fact that\nit’s short. But a similarly short-and-uninformative name for a rarely used tool\nis more of a problem.\n\nAll this said, descriptive names have downsides too. Descriptives names can be\nlong, and long names impose their own mental cost – at some point people just\nstop being willing to read the paragraph you’ve embedded in the name. There are\nalso platform-specific limits to name lengths for things like usernames and host\nnames, which matters if you’re going to have users or hosts named after your\nsystem.\n\nPractice empathy\n\nPicking names is fun, but you shouldn’t let that lead you to waste other\npeople’s time. We avoid picking cutesy names for identifiers in programs because\nit makes the code harder to understand and remember, and you should have similar\nconcerns when picking names for your systems.\n\nThis doesn’t mean you should never pick a fun name. But you should do so\nsparingly.\n\nAvoid churn\n\nNames, once chosen, are hard to change. The name gets itself embedded in more\nplaces than you can imagine, including other people’s brains. Once it’s out\nthere, it’s often better to let it be than to rename it.\n\nBut that very stickiness means you should put some effort into picking a good\nname, since you’ll probably be stuck with it for a long time.\n",
        "url"      : "https://blog.janestreet.com/notes-on-naming/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Reading Lamport, again",
        "date"     : "June 26, 2014",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 3,
        "content"  : "We’ve just kicked off an internal distributed-systems seminar. Our inaugral\npaper was Lamport’s classic “Time, Clocks and the Ordering of Events in a\nDistributed System”.\nI remembered the paper fondly, but hadn’t looked back it it for more than a\ndecade.\n\nGoing back and reading it carefully was a fun experience, and I realized that I\nhadn’t really understood the paper properly at the time. Maybe that’s not\nsurprising – Lamport himself likes to complain that no one really understands\nthe main contribution of the paper, which from his perspective is the\nintroduction of state machine replication.\n\n(As a side note, If you come to think that everyone has missed what you believe\nto be the main point of your paper, you should consider the question of whether\nthe paper should have been written differently.)\n\nBut the bit that surprised me is something else, which is the somewhat circular\nway that Lamport motivates the key results in the paper.\n\nA little background. In this paper, Lamport suggests that one should care about\nthe causal-partial-ordering of events in a distributed system, where roughly\nspeaking, e &gt; e' if there is a patch of messages leading from e' to e\n\nLamport then describes a simple algorithm for building causally consistent\nlogical clocks. The key guarantee is that it produces a timestamp C(e) for\nevery event e that has the property that if e &gt; e', then C(e) &gt; C(e').\n\nTo motivate the utility of this clock, he then shows an algorithm for\nimplementing a distributed mutual exclusion algorithm that uses it. On the face\nof it, this seems like a good argument, but if you look closely, there’s\nsomething circular here. In particular, consider Lamport’s mutual exclusion\nspec.\n\nI. A process which has been granted the resources must release it before it can\nbe granted to another process.\n\nII. Different requests for the resource must be granted in the order in which\nthey are made.\n\nIII. If every process which is granted the resource eventually releases it, then\nevery request is eventually granted.\n\nThe fairness condition (II) is the interesting one. If you think about it,\nyou’ll realize it’s not quite clear what he means by “the order in which they\nare made”. What ordering is he talking about? He can’t be talking about the\nreal-time order, since if he was, the property would be impossible to achieve:\nimagine two hosts requesting access a nanosecond apart. It’s clear that without\nsynchronized clocks or some other time-like information, there’s no way for the\nsystem to know who went first, and so the outcome of the mutual exclusion\nalgorithm can’t depend on that.\n\nWhat Lamport means, it turns out, is that if request r is causally after\nrequest r', then r'’s request will be granted first.\n\nAnd that’s the circularity. It’s not terribly surprising that if causality is\nbaked into your specification, then you’re going to have to use causality as\npart of your implementation. But that’s not really convincing of much on its\nown, and Lamport doesn’t attempt to motivate why causal fairness is important\nfor a mutual exclusion algorithm.\n\nNone of this is really to take away from the paper. This paper, and all of\nLamport’s work really, is carefully thought through, well argued, and written in\na wonderfully precise manner. And the paper does indeed introduce foundational\nideas (state machine replication being the most important) and a conceptual\nvocabulary that was very influential.\n\nBut it raises the point that when you go and read a paper, even a well-regarded\nclassic, you should approach it with a thoughtful and skeptical eye.\n",
        "url"      : "https://blog.janestreet.com/reading-lamport-again/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Iron out your release process",
        "date"     : "June 24, 2014",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 10,
        "content"  : "This is the third in a series of posts about the design of Iron, our new code\nreview and release management tool. The other two posts\nare here\nand here.\n\nJane Street’s code is (mostly) organized as one big hg repo. Whether this is a\nreasonable approach is a matter of some debate. Facebook, for example, does\nsomething similar, and when they tried to work with the git community to try to\nmake git handle their use-case, they were pretty roundly condemned. After all,\nwho would be crazy enough to put millions of lines of fast-changing code into a\nsingle repo? Why not mint some decent modularity boundaries and use them to\nbreak things up?\n\nI’m all for modularity, but I agree with Facebook on this one. Keeping things in\na single repo has advantages even when your code is modular. In particular, it\nlets you use a single hash to fully describe the code in your system. That\nsimplifies testing and deployment by making it easier to specify what needs to\nbe tested and deployed. And it stops you from having to figure out which\nversions of each of your dependencies you need to build your system. Look ma, no\nSAT solver!\n\nThe one-big-repo approach also simplifies the process of making big,\ncross-cutting changes, since it gives you a way of making an atomic change that\ncrosses a big subset of your code. This makes it easier to modernize your code\nand undo mistakes of the past.\n\nLinear development and its discontents\n\nThe single-big-repo model often comes along with a (mostly) linear release\nprocess. In such a release process, there is a release branch with a single\nhead, and changes are reviewed and then released by applying them as a single\nchangeset to this head.\n\n(Note that by release I’m talking about the process by which a change enters\ninto the source tree and is this made available to other developers. This is\ndistinct from rolling software into production, which is how you make the\nresulting software available to users.)\n\nA linear release process is pleasantly simple and has a lot of advantages, but\nit has its limitations too. Complex changes that require significant time to\nreview and refine can be a particular pain point.\n\nDeveloping a complex change as a single patch isn’t ideal. Reading and\nunderstanding a big patch is a lot harder then reviewing a sequenced of smaller\npatches, at least if they’ve been broken up into pieces that have some\nconceptual unity.\n\nBut breaking a large change into small, independently reviewed chunks is\nproblematic too, because the individual chunks may leave you in an awkward state\nwhere you’re not ready to roll the resulting code. This violates the desirable\nproperty that the tip of your release branch should be at or close to a rollable\nstate.\n\nSome projects deal with this by encouraging developers to release their\nincremental changes before they’re truly ready for production, but to disable\nthem using so-called feature\ntoggles. For us, this is a\ncomplete non-starter. Introducing untrusted code paths into a production system\nin the hopes that they won’t be used is too terrifying to contemplate. But when\nusing a linear release process, it’s not clear what other choices you have.\n\nHierarchical features\n\nIn Iron, we use hierarchical features as a structured way of managing a\nnon-linear release process. Hierarchical features are quite new for us, and so\nwe’re still learning our way around them, but so far they seem to add quite a\nbit of flexibility, without sacrificing our ability to reason about and manage\ncode review.\n\nThe basic idea is simple. Every repository has a single root feature, and every\nother feature is created as the child of an existing feature (possibly the root\nfeature). Here, for example, is the current sub-tree corresponding to the\ndevelopment of Iron itself.\n\n[21:50:31 ~]$ fe list jane/fe -display-ascii\n|------------------------------------------------------|\n| feature                                      | lines |\n|----------------------------------------------+-------|\n| jane                                         |       |\n|   fe                                         | 26    |\n|     backups                                  | 340   |\n|     color-enabled-features-in-fe-list        | 113   |\n|     comments-and-formatting                  | 1     |\n|     disallow-archiving-permanent-features    | 13    |\n|     do-not-show-jane-from-cr-CR-soons        | 5     |\n|     fact                                     | 979   |\n|     fix-int-alignment                        | 14    |\n|     fix-tests                                | 24    |\n|     improve-fe-session                       | 24    |\n|     improve-release                          | 20    |\n|     improve-review-not-enabled-error-message | 9     |\n|     not-releasable-if-pending                | 30    |\n|     use-async-process                        | 41    |\n|------------------------------------------------------|\n\n\n\nThere are two basic operations that are used to manage this hierarchy: rebase\nand release.\n\nReleasing a feature takes the changes in that feature and propagates them to the\nparent feature. Consider for example the feature jane/fe, which corresponds to\nthe most recently released version of Iron, and jane/fe/backups, which is the\nfeature where Iron’s backup support is being developed. jane/fe/backups is\nreleasable if:\n\n\n  the base revision of jane/fe/backups is the tip of jane/fe\n  the review ofjane/fe/backups is complete.\n  there are no outstanding issues in the code to be resolved. Issues are\ntracked through specially formatted comments in the source.\n\n\nWhen you run fe release jane/fe/backups, the tip of jane/fe is advanced to\nthe tip of jane/fe/backups. If jane/fe/backups has no children of its own,\nthen as a convenience the bookmark is automatically removed and the feature is\narchived.\n\nWhere releasing moves changes from the child to the parent, rebasing does the\nopposite. Rebasing is useful in the case where the parent feature has changed\nsince the child feature was branched. For example, imagine that after\njane/fe/backups was created, a different feature, jane/fe/fix-tests was\nreleased into jane/fe. That means that the tip of jane/fe was changed to\nincorporate the changes in fix-tests.\n\nIf you run fe rebase jane/fe/backups, it will merge the tip of\njane/fe/backups with jane/fe and update jane/fe/backups to that bookmark.\nIt also changes the base revision of jane/fe/backups to be the tip of\njane/fe. The end result is that jane/fe/backups will now reflect the\nreleased changes from jane/fe/fix-tests.\n\nCalling this operation a rebase is a little confusing, because the underlying\noperation on the hg graph is a merge, not a rebase. But it’s not a changeset in\nthe hg graph that’s being rebased.\n\nRebase and release turn out to be the basis for a rich toolset for building\ndifferent release workflows. Here are a few examples.\n\nDependent features\n\nIt’s often useful to develop a chain of dependent features, each feature using\nfunctionality developed by the feature before it. This can make it easier to\ndevelop bigger changes as a collection of smaller and easier to verify pieces.\n\nAs an example, let’s say I want to add a new function to the List module. I\ncan mint a feature for that.\n\n$ fe create jane/List.fold_until -description \"\n    Add fold_until to List, which is a fold with an\n    explicit stopping condition\"\n\n\n\nOnce I commit and push, I might want to develop another feature that uses\nList.fold_until. I can start that dependent feature immediately (even before\nList.fold_until is released) by making my next feature a child of\nList.fold_until.\n\n$ fe create jane/List.fold_until/clean-up-fe-show \\\n    -description \"\n       clean up the fe show code by using List.fold_until\n       instead of an exception to terminate fold\"\n\n\n\nNow, clean-up-fe-show can be developed and reviewed on its own. Note that if\nList.fold_until changes in the meantime, there’s no real problem. We just need\nto rebase clean-up-fe-show to take those changes into account. And we now have\nthe choice of releasing List.fold_until first, or of releasing them together,\nby first releasing clean-up-fe-show into List.fold_until, and then releasing\nList.fold_until.\n\n$ fe release jane/List.fold_until/clean-up-fe-show\n$ fe release jane/List.fold_until\n\n\n\nLocalizing releases\n\nSometimes, it just doesn’t make sense to have one release process for your\nentire tree. For example, you might have a project that needs a certain amount\nof user-testing before you’re comfortable doing a release. That testing may\nuncover small changes you need to make along the way. But releasing those\nchanges to the tip of the root feature may require you to pull in unrelated\nchanges that would invalidate more of your testing.\n\nWith Iron, we handle this by minting a feature that corresponds to the release\nprocess for the project in question. That way, you get full control over how\nchanges move into your feature. For developing Iron itself, releases are done\nout of the jane/fe feature. The owners of that feature decides which of the\ndescendents of jane/fe get released into it. And they can rebase jane/fe\nwhenever they’re ready to pull in the most recent changes that have hit jane.\n\nAfter rebasing, they can also release the changes that were done within\njane/fe into jane, so others can benefit from them. All of this is under the\nexplicit control of the owners of jane/fe, which allows them a lot of\nindependence, while still giving them tools to easily integrate their work with\nothers’.\n\nContinuous release\n\nDoing rebases and releases by hand can be a drag, especially for a feature like\njane that are released into quite often.\n\nTo deal with this, we have a continuous release process for jane that is\nautomated by Hydra, our build-bot. When a feature is proposed for release, Hydra\ndoes the following:\n\n\n  Rebases the feature against its parent\n  Checks that the resulting feature builds and all tests pass\n  Calls fe release, which does its own checks as described earlier,\nincluding checking that the feature is fully reviewed.\n\n\nIf anything fails, the submitter is emailed with the details, and Hydra will try\nagain if there are any changes either to the feature itself or to its parent.\n\nGiven that we do releases out of features other than jane, there’s no reason\nnot to have support continuous releases in those places too. Accordingly, Iron\nlets you mark any feature as a continuous release feature, which signals to\nHydra that it should run the continuous release process for the feature in\nquestion.\n\nWhy Iron matters\n\nIron is a pretty big improvement for us. Part of this is just that the system it\nreplaces was old and crufty and had significant performance problems. But it’s\nmore than that. Iron does a good job of breaking down the sometimes messy\nprocess of reviewing and releasing software into simple, easy to understand\npieces. Given how fundamental these processes are to the act of developing\nsoftware, I think that this is a real contribution.\n\nThe software itself is going to be open-sourced in the next month or two, but\nI’m hopeful that the ideas behind Iron will have some value as well. I’ve been\nstruck in the last decade of working as a professional software engineer at how\nlittle is written about these issues. Outside of organizations that have built\nhigh quality tools to solve these problems internally, I think these questions\nare not that widely understood.\n",
        "url"      : "https://blog.janestreet.com/ironing-out-your-release-process/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Scrutinize your code in style",
        "date"     : "June 13, 2014",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 5,
        "content"  : "This is the second in a series of posts about the design of Iron, our new code\nreview tool. You can read the first post here.\nAlso, I should give credit where credit is due. While I’ve been involved in some\nof the design discussions, the real work has been done by Stephen Weeks,\nValentin Gatien Baron, Olin Shivers (yes, that Olin Shivers. He’s joining us for\npart of his sabbatical) and Mathieu Barbin.\n\nJane Street’s code review tools and culture were built around a small subset of\nour systems which we descibe as critical path. These systems have some role in\nthe carrying of an order to the market, and we’re pretty paranoid about getting\nthings wrong there.\n\nOut of that paranoia, we set up some rules for ourselves. In particular:\n\n\n  Every new feature has an assigned seconder, who is intended to act as a\nkind of co-designer of the feature. A seconder shouldn’t approve a feature\nuntil he believes in the feature enough to defend it to others.\n  Individual files also have assigned reviewers. Every file has three assigned\nreviewers. None of these reviewers except the seconder should start\nreviewing until the seconder has finished.\n  The level of scrutiny is high. Reviewers should read minutely and carefully\nfor coding style, clarity and correctness, as well as make sure the changes\nare adequately tested and documented.\n\n\nThis reviewing style is reasonable for our safety-critical systems, but is also\npretty expensive to maintain. As such, it just doesn’t make sense for every\npiece of code we write.\n\nThe way we handled this early on was that we divided the world into two parts:\nthe critical path, where we did this high level of review; and everything else,\nwhere we did no review at all.\n\nPerhaps unsurprisingly, this was a mistake. Code review is valuable even for\ncode that is not safety-critical. All too often, unreviewed parts of the tree\nturned into swamps that had to be drained later. But the high-level of scrutiny\nwe apply to critical path systems just doesn’t make sense everywhere. The only\nreasonable solution is to apply different standards to different codebases.\n\nAs one example, we have a set of tools developed by the trading desks. These are\nlargely rough-and-ready tools for aggregating and displaying information to the\ntraders. They’re a lot lower risk than the trading systems, and, accordingly,\nthe review style we use is quite different. In particular:\n\n\n  Every feature has a seconder. The seconder should look over the feature to\ncheck for style issues and to check the rough sanity of the change.\n  File reviewers will be consulted only after the seconder approves, but file\nreviewers are not mandatory, and are pretty rare.\n\n\nWhile we have different review styles for different codebases, our existing\ntools didn’t support this explicitly. Everything was left to people remembering\nwhich standard to apply to which piece of code, often hinging on users\nremembering what they’re supposed to do for different parts of the tree.\n\nIron was designed from the beginning with the goal of supporting multiple review\nstyles within the same repository. These styles are codified in Iron’s\nobligations files, which describe the reviewing styles and define which part\nof the tree is done in which style.\n\nEach style of review is codified as what is called a scrutiny. A scrutiny\nconsists of:\n\n\n  A name, which is shown to reviewers when they’re reviewing as a mnemonic.\n  A description, which is meant to explain the goals the review should keep in\nmind for this review scrutiny.\n  The number of required reviewers.\n  A scrutiny level, which is simply an integer that is meant to indicate the\nintensity of the scrutiny.\n\n\nIron uses this scrutiny information to notify the reviewer as to the level of\nscrutiny they’re supposed to apply, as well as to make sure that rules about the\nnumber of reviewers are followed.\n\nThings get more complicated, though, when a file changes from one scrutiny to\nanother. Iron of course notifies the user about a change, but importantly, if\nthe scrutiny of a file goes up, Iron will ask the reviewers to re-review from\nscratch as well.\n\nTracking scrutiny properly can be pretty tricky. Consider again the case of Bob\nand Carla, from my previous post. If you remember, Carla get her feature\nreleased first, while Bob was still getting his feature reviewed. Now, for Bob\nto release, he needs to merge his feature with Carla’s. Here’s a diagram of the\nrevisions involved.\n\n  Fb'\n  / \\\n /   \\\nFc    Fb\n \\   /\n  \\ /\n   B\n\n\n\nHere, B-&gt;Fc is Carla’s feature, B-&gt;Fb is Bob’s feature, and Fc-&gt;Fb' is the\nrebased version of Bob’s feature.\n\nNow, what if Carla’s feature increased the scrutiny on foo.ml, and Bob’s\nmodified foo.ml. That means that in Carla’s feature, foo.ml was reviewed\nfrom scratch at the higher level of scrutiny, but in Bob’s feature, the changes\nto foo.ml were reviewed at the lower level.\n\nAs a result, the two features aren’t safe to release together, since it would\nallow the release of a low-scrutiny change to a high scrutiny piece of code.\n\nIron gets this case right by requesting that the reviewer reread Fc-&gt;Fb' at\nthe higher level of scrutiny. This is just one among many tricky corner cases,\nand we’ve handled those in Iron by actually creating a simple logic, inference\nrules and all, which allows us to determine when a given feature should count as\nreviewed.\n\nAll of this underlines that when you take code review seriously, things get\ncomplicated, and you need to really amp up your tools to manage that complexity.\n\nThat’s all I wanted to say about scrutiny. In the next post, I expect to cover\nhierarchical features, which are a surprisingly fluid tool for managing complex\nworkflows in Iron. But more on that later.\n",
        "url"      : "https://blog.janestreet.com/scrutinizing-your-code-in-style/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Code review that isn't boring",
        "date"     : "June 12, 2014",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 7,
        "content"  : "At Jane Street, we care a lot about code review. We think that high quality\ncode, and in particular, readable code, helps us maintain the safety of our\nsystems and keeps things simple and clean enough for us to stay nimble.\n\nBut code review is hard to do without some kind of automation. We have an\nexisting tool called cr that we’ve used for a long time. I posted a\ndescription of the initial design (here\nand here) back in\n2009, and posted an update last year\non how we’d revised our approach to review since then.\n\nBut cr is showing its age. It has significant problems, both in terms of the\nworkflow it implements, and in terms of performance. cr itself is, ironically,\na fairly messy piece of code, so rather than fix it, we’ve been working for the\nlast few months on a brand new replacement called Iron, or fe for short. We’ve\njust started rolling Iron out, and it’s a big improvement, and represents a\nsignificant step up in our understanding of how to organize review. It isn’t\nreleased publicly yet, but we will open source it after it stabilizes a bit\nmore.\n\nThis is the first in what I expect to be a series of posts about Iron. In this\none, I’d going to try to explain why there’s a problem to be solved in the first\nplace, and I’ll do that by walking through some examples that should underline\nwhy this is a tricky problem.\n\nScenario 1: Simple patch review\n\nOne simple approach to code review is patch review. Here’s how it might work\nin a simple case.\n\n\n  Bob develops a new feature\n  He creates a patch, and shows the patch to Alice.\n  Alice reviews the patch, and if it looks good, she approves it.\n  The patch is accepted, and applied to the tree, making a new revision.\n\n\nThat’s simple enough, but what happens if Bob makes some changes in response to\nAlices’ review? How does Alice become confident in those changes? One common\nanswer is that she should simply read the new patch from scratch.\n\nBut reading the new patch in its entirety violates one of our design principles\nfor Iron: don’t be boring. Humans are bad at carefully thinking about boring\nthings, and rereading a patch that hasn’t changed much since you saw it last is\nincredibly boring.\n\nThere’s a simple workaround here. Rather than reread the whole patch, Alice can\njust reread the bits that changed. Here’s a simple picture to help illustrate:\n\nF2\n|\nF1\n|\nB\n\n\n\nHere, B is the state of the code when Bob created his feature, or the base\nrevision of his feature. F1 is the state of the feature when he handed the\npatch, B-&gt;F1, to Alice. And F2 is the state of the feature after he\nresponded to Alice’s request.\n\nWith this in mind, it’s easy to see that Alice has the option of reading\nF1-&gt;F2, which just contains Bob’s fixes rather than the B-&gt;F2, which has\nboth Bob’s original changes plus his fixes.\n\nScenario 2: Concurrent patch review\n\nPatch review gets a bit more complicated when more people get involved.\n\nConsider what happens if Carla has her own feature, and let’s say she managed to\nget it released before Bob did. That means that Bob will need to release his\npatch applied to a different version of the code than the one he based his\nfeature on. Here’s a picture to illustrate the situation.\n\nFc     Fb\n \\    /\n  \\  /\n   B1\n\n\n\nHere Fc is Carla’s feature, and Fb is Bob’s. Given that Fc has been\nreleased, Bob needs to make his feature a descendent of Fc in order to release\nit too.\n\nIf the diff B1-&gt;Fb applies cleanly to Fc, then that’s easy enough: we can\njust apply Bob’s patch to Carla’s released feature and call it a day. This is\neffectively a way for Bob to merge his patch with Carla’s.\n\n  Fb'\n  / \\\n /   \\\nFc    Fb\n \\   /\n  \\ /\n   B\n\n\n\nThis raises some natural concerns. For one thing, Alice didn’t read Bob’s patch\nin a vacuum: she presumably read it in the context of the state of the codebase\nas it was when the patch B1-&gt;Fb was created. Now that the patch is being\napplied to Fc, how does Alice know that it still makes sense? The fact that\nthere are no textual conflicts is no guarantee.\n\nThe answer is that Alice doesn’t know for sure. The only principled way around\nthis would be for Alice to take a lock on the repository as she reads Bob’s\npatch, and not let anything (including Carla’s patch) jump ahead.\n\nBut this way madness lies. Development at scale require parallelism, and so you\nneed to allow concurrent review. In the end, all we can do is to try to cover\nfor this in other ways, in particular by using tests and types to make\naccidental breaking of the code from such a sitution less likely.\n\nScenario 3: Conflicted patch review\n\nWhat if Bob’s patch doesn’t apply cleanly to Carla’s release? In this case, Bob\nneeds to create a new revision that merges together Carla’s changes with his\nown, say, by applying the patch B-&gt;Fb to Fc, and resolving the conflicts by\nhand. Similarly, Bob could use the merge algorithm from his version control\nsystem to do the merge, but again conflicts must be resolved by hand.\n\n  Fb'\n  / \\\n /   \\\nFc    Fb\n \\   /\n  \\ /\n   B\n\n\n\nGiven that Alice has already read B-&gt;Fb, how does she become comfortable with\nFc-&gt;Fb'?\n\nShould could read the diff Fc-&gt;Fb', but that would violates our boredom\nprinciple, since Fc-&gt;Fb' will generally look a lot like B-&gt;Fb. Reading the\ndiff Fb-&gt;Fb' is no better, since this will largely be a recapitulation of the\nchanges made by Carla in B-&gt;Fc.\n\nOne way out of this boredom trap is to read the diff-of-diffs, or ddiffs. In\nother words, Alice can read (B-&gt;Fb)-&gt;(Fc-&gt;Fb'). By reading this, she can get a\nview on how the feature changed.\n\nNow, diffs-of-diffs can be hard to read, and you only want to read them when you\nabsolutely have to. That’s why we wrote a tool called patdiff4 for use with\nIron. patdiff4 does a hunk-by-hunk analysis of the code in the four revisions\nof the merge diamond, and tries to present each hunk in the simplest way\npossible, sometimes using ddiffs, but using other simpler visualizations where\npossible.\n\nRebasing review\n\nDealing with review of conflicted merges is at the heart of Iron. In Iron, we\nrefer to this as a rebase, but not because it’s a rebase of the kind you see\nin git or mercurial. After all, from the point of view of the version control\nsystem, you might be doing a merge rather than a rebase. But from Iron’s\nperspective, the goal is to rebase the review. Thus, after a rebase, Iron thinks\nof Alice as having approved of Fc-&gt;Fb' even though she read B-&gt;Fb and then\nlater read the ddiff (B-&gt;Fb)-&gt;(Fc-&gt;Fb').\n\nRebasing isn’t the end of the Iron story. In future posts, I’ll discuss the two\nother main ideas that animate Iron: Iron’s management of scrutiny, which is\nhow it keeps track of the level of scrutiny that a user is supposed to apply to\na given review; and hierarchical features, a tool for handling complex\nworkflows, including dealing with features that depend on each other.\n",
        "url"      : "https://blog.janestreet.com/code-review-that-isnt-boring/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "OCaml 4.02: everything else",
        "date"     : "May 18, 2014",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 8,
        "content"  : "This is the last in my series of posts about new features in OCaml 4.02. So far,\nI’ve discussed how OCaml is getting more like Lisp because\nof\nextension points,\nhow module aliases will massively\nspeed up compilation of Core and similar libraries, and how you can simplify\nyour error handling\nby\ncatching exceptions with match statements.\n\nBelow, I’ve summarized the other features that strike me as worth mentioning,\nbut don’t seem worth their own blog post.\n\nGoodbye Camlp4\n\nThis is in some sense related to the extension-points change, but camlp4 has\nbeen evicted from the compiler, and is now its own independent project, with\nJeremie Dimino as the primary maintainer.\n\nI think this is good news for OCaml and for camlp4. Updating camlp4 to match new\nfunctionality in the compiler is hard, and in the past, new compiler releaes\noften came out with camlp4 broken in some subtle way. (In 4.01.0, for example,\nthe new open! syntax was broken in camlp4).\n\nMarrying camlp4 and OCaml together slows OCaml down, and also means that when\nOCaml gets released, we often get stuck with a broken camlp4. Now, because they\nhave separate release cycles, it will be possible to fix camlp4 bugs when they\narise, meaning we won’t get stuck with incompatible bugs for long.\n\nA key enabler of this disentanglement is opam. Having a decent package manager\nmakes it simpler and easier to deal with a more disaggregated world. Hopefully,\nthis will lead to a nimbler compiler development process.\n\nOpen Types\n\nOrdinary OCaml variants are closed, which is to say that once you define a\nvariant, you can’t extend it with new cases. But OCaml does have another\nalmost-variant that’s open, which is to say you can add new variants to it\nafter it has been defined: the exn type.\n\nOpen types are useful for more than just exceptions, though. They tend to come\nup in certain kinds of modular designs where you want a single type to act as a\nkind of meeting point between values that come from different places.\n\nIn the past, when we’ve needed open types, we’ve basically abused the exception\ntype to get this functionality. In 4.02, you can simply declare new open variant\ntypes. Interestingly, these new open types are a bit more powerful than the old\nexception model, in that a new open type can have type parameters, and the\nconstructors can be GADTs.\n\nBetter format with GADTs\n\nOCaml’s printf is both great and terrible. They’re great, because they give you\na type-safe way of dealing with format strings.\n\n# printf \"a string: %s, an int: %i\\n\" \"three\" 3;;\na string: three, an int: 3\n- : unit = ()\n# printf \"a string: %s, an int: %i\\n\" \"three\" 3.5;;\nCharacters 44-47:\n  printf \"a string: %s, an int: %i\\n\" \"three\" 3.5;;\n                                              ^^^\nError: This expression has type float but an expression was expected of type\n         int\n\n\n\nThis type-safety comes at a bit of complexity, though. First, OCaml has to parse\nthe format string at compile time and convert it to an object that understands\nthe types of the values it needs to consume. That’s not so bad, but\nunfortunately, before 4.02, this was done with a special-purpose type that\ndidn’t fit neatly into the type system. Perhaps because of this, there have been\nmany bugs over the years associated with format types.\n\nIn addition, printing with format types was horribly slow. OCaml 4.02 solves\nboth of these problems with a rewrite of the format types on top of GADTs.\n\nImmutable strings\n\nThis one was a surprise. One unfortunate bit of historical cruft in the language\nis that the default string type in OCaml is mutable. Nobody really likes this,\nbut it seemed too painful to change, since changing it would obviously break\nlots of old code.\n\nWhat the Caml team did instead was to make it possible to make strings\nimmutable. In particular, there is a new module Bytes which is intended for\ndealing with mutable byte buffers, whose underlying type Bytes.t is the same\nas String.t. And there’s now a flag which when you turn it on, breaks the type\nequality between Bytes.t and String.t, and also disables the mutation\noperators in String. This gives us a migration path towards making strings\nimmutable. It will take a while for it to push through, but I do expect lots of\npeople to make the flip, including us at Jane Street.\n\nGenerative functors\n\nFor you SML fans, OCaml now has generative in addition to applicative functors.\nApplicative functors have the property that when run repeatedly on the same\ninput module, they generate the same types in the output. This is sometimes\nuseful, but it’s sometimes not at all what you want. For example, consider this\ncase.\n\nmodule Unique_id (Unit : sig end) : sig\n   type t\n   val allocate : unit -&gt; t\nend = struct\n   type t = int\n   let id = ref 0\n   let allocate () = incr id; !id\nend\n\n\n\nThis is supposed to generate a new unique-id module with a distinct type every\ntime it’s called. But if you call it on the same module, you’ll get the same\ntype, which is totally wrong, as you can see:\n\n# module Empty = struct end;;\nmodule Empty : sig  end\n# module Id1 = Unique_id (Empty);;\nmodule Id1 : sig type t = Unique_id(Empty).t val allocate : unit -&gt; t end\n# module Id2 = Unique_id (Empty);;\nmodule Id2 : sig type t = Unique_id(Empty).t val allocate : unit -&gt; t end\n# Id1.allocate () = Id2.allocate ();;\n- : bool = true\n\n\n\nThis is clearly not what we want. If we used different (but identical) modules\nas inputs, however, we would have had no problem.\n\n# module Id1 = Unique_id(struct end);;\nmodule Id1 : sig type t val allocate : unit -&gt; t end\n# module Id2 = Unique_id(struct end);;\nmodule Id2 : sig type t val allocate : unit -&gt; t end\n# Id1.allocate () = Id2.allocate ();;\nCharacters 18-33:\n  Id1.allocate () = Id2.allocate ();;\n                    ^^^^^^^^^^^^^^^\nError: This expression has type Id2.t but an expression was expected of type\n         Id1.t\n\n\n\nGenerative functors work like the second case every time, which for this kind of\nfunctor makes more sense. We can mark a functor as generative by having a dummy\nargument of the form (). So, we can redo our example as follows:\n\nmodule Unique_id () : sig\n   type t\n   val allocate : unit -&gt; t\nend = struct\n   type t = int\n   let id = ref 0\n   let allocate () = incr id; !id\nend\n\n\n\nAnd now, there’s every invocation of this functor produces a fresh type.\n\n# module Id1 = Unique_id ();;\nmodule Id1 : sig type t val allocate : unit -&gt; t end\n# module Id2 = Unique_id ();;\nmodule Id2 : sig type t val allocate : unit -&gt; t end\n# Id1.allocate () = Id2.allocate ();;\nCharacters 18-33:\n  Id1.allocate () = Id2.allocate ();;\n                    ^^^^^^^^^^^^^^^\nError: This expression has type Id2.t but an expression was expected of type\n         Id1.t\n\n\n\nThe other benefit of generative functors is that they lift the annoying\nrestriction on unpacking first class modules within applicative functors.\n\nOptimizations\n\nThere are a few good optimizations that landed. One of them derived from work\ndone by Phil Denys, who was an intern at Jane Street when he implemented some\ndivision-by-a-constant optimizations. Another came from our own Vlad Brankov,\nwho eliminated some unnecessary float boxing associated with let bindings. And\nthere’s a number of other ones, improving the compilation of optional arguments,\naccessing values in nested modules, and more. We’ll see the results of these\nmore clearly when we get to building our whole tree with the new compiler and\nrunning our benchmarks.\n\nSumming up\n\nThat’s not quite everything, but it’s close. Notably, there’s the usual\ncollection of small bugfixes and tweaks which didn’t seem worth mentioning\nindividually. But really this covers most of the interesting changes\n\nAll told, it’s a pretty serious release. I think it’s a sign of how much energy\nis being poured into the language. Indeed, the speed of change is high enough\nthat it raises other concerns: is OCaml moving too fast? Is it accreting\nfeatures at such a rate that the language is going to get too complicated?\n\nI think the answer is no. The changes that have been coming seem to me to be\noverwhelmingly thoughtful and conservative. Indeed, some of the changes, like\nextension points, or the new GADT-based format strings, are all in\nsimplifications.\n\nThere’s still some time until this all gets released. There are bugs that are\nbeing actively tracked down, and there’s a lot of work to be done to test this\nrelease. But from what I understand, we should see a final release some time\nthis summer.\n",
        "url"      : "https://blog.janestreet.com/ocaml-4-02-everything-else/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Pattern matching and exception handling, unite!",
        "date"     : "May 17, 2014",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 3,
        "content"  : "(OCaml 4.02 has branched, which makes it a good time to stop and take a look at\nwhat to expect for this release. This is part of a series of posts where I’ll\ndescribe the features that strike me as notable. This is part 3. You can also\ncheck out parts 1\nand 2.)\n\nThis one is a modest improvement, but a nice one.\n\nHere’s a simple bit of code for reading a file line-by-line in OCaml.\n\nlet read_lines inc =\n   let rec loop acc =\n     try\n       let l = input_line inc in\n       loop (l :: acc)\n     with End_of_file -&gt; List.rev acc\n   in\n   loop []\n\n\n\nBut the above code has a problem: it’s not tail recursive, because the recursive\ncall to [loop] is within the exception handler, and therefore not a tail call.\nWhich means, if you run this on a sufficiently large file, it will run out of\nstack space and crash.\n\nBut there’s a standard way around this problem, which is to wrap just the\ninput_line call with try_with, and then pattern match on the result. That\nwould normally be done like this:\n\nlet read_lines inc =\n   let rec loop acc =\n     match (try Some (input_line inc)\n            with End_of_file -&gt; None)\n     with\n     | Some l -&gt; loop (l :: acc)\n     | None -&gt; List.rev acc\n   in\n   loop []\n\n\n\nThis is an OK solution, but it has some warts. In particular, there’s the extra\noption that gets allocated and immediately forgotten, which can be problematic\nfrom a performance perspective. Also, the nesting of the try/with within the\nmatch is a bit on the ugly side.\n\nThat’s where handler-case comes in. Essentially, in 4.02 the match statement\nand the try-with statement have been combined together into one. Or, more\nprecisely, the match syntax has been extended to allow you to catch exceptions\ntoo. That means you can rewrite the above as follows.\n\nlet read_lines inc =\n   let rec loop acc =\n     match input_line inc  with\n     | l -&gt; loop (l :: acc)\n     | exception End_of_file -&gt; List.rev acc\n   in\n   loop []\n\n\n\nThis is both more concise and more readable than the previous syntax. And the\ncall to loop is tail-recursive, as one would hope.\n\n(If this isn’t obvious, while the above is a good example, it’s not what you’d\nwrite to solve this problem in practice. Instead, you might use Core’s\nIn_channel.fold_lines, as follows:\n\nlet read_lines inc =\n   In_channel.fold_lines inc ~init:[] ~f:(fun l x -&gt; x :: l)\n   |&gt; List.rev\n\n\n\nOr you could just call In_channel.read_lines!)\n",
        "url"      : "https://blog.janestreet.com/pattern-matching-and-exception-handling-unite/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Core_bench: better micro-benchmarks through linear regression",
        "date"     : "May 17, 2014",
        "authorId" : "rjames",
        "author"   : "Roshan James",
        "tags"     : [],
        "minsToRead" : 16,
        "content"  : "This post is meant to be an introduction to\nCore_bench, our\nmicro-benchmarking library for OCaml. Core_bench is similar to Haskell’s\nmicro-benchmarking library,\nCriterion,\nin that it serves the same overall purpose. It is however not a direct port of\nCriterion to OCaml, but instead employs a different approach to estimating costs\nthat we think yields better results.\n\nWe think the result is a benchmarking tool whose results are easier to\ninterpret, and as we’ll see later give us better intuitions as to how much of\nthe time spent by our code can be explained by garbage collection.\n\nCore_bench is also meant to be easy to use. The source is publicly available on\ngithub and the library can be\ninstalled by running opam install core_bench.\n\nHere is an example program that uses Core_bench:\n\nopen Core.Std\nopen Core_bench.Std\n\nlet () =\n  Command.run (Bench.make_command [\n    Bench.Test.create ~name:\"id\"\n      (fun () -&gt; ());\n    Bench.Test.create ~name:\"Time.now\"\n      (fun () -&gt; ignore (Time.now ()));\n    Bench.Test.create ~name:\"Array.create300\"\n      (fun () -&gt; ignore (Array.create ~len:300 0))\n  ])\n\n\n\nLet’s ignore the Bench.make_command for a moment and focus on the calls to\nBench.Test.create. Each benchmark consists of a name and a function of type\nunit -&gt; unit. When you run the program you see something like:\n\nName                Time/Run   mWd/Run   mjWd/Run\nid                    3.08ns\nTime.now            843.82ns     2.00w\nArray.create300   3_971.13ns              301.00w\n\n\n\nThis short program already produces some interesting outputs. It says that the\ncost of calling an empty thunk is estimated to be about 3 nanos on my old\nlaptop. Time.now takes 843 nanos and allocates 2 words on the minor heap due\nthe fact that it returns a boxed float (one word is 64 bits on a 64 bit\nmachine). Array.Create300 allocates 301 words directly into the major heap.\nThe array has length 300 and the one additional word is for the array block\nheader. Some experimenting shows that Arrays of length 256 or larger are\nallocated directly into the major heap.\n\nBench.make_command gives the program a rich set of command line options. As we\nwill explain below, Core_bench uses linear regression for estimating the costs.\nThere are command line options to inspect various error estimates which include\n95% confidence intervals calculated via bootstrapping and the goodness of fit,\nR\\^2, for the\nregression. One can also specify the time quota allowed for benchmarking from\nthe command line. The more data samples the program can collect, the better its\nestimates will be. For short-lived functions of the order of microseconds,\nbenchmarking time of about a second or so produces estimates with pretty tight\n95% confidence intervals. If not enough data was collected for a good estimate,\nthen one would typically see a wide 95% confidence interval.\n\nMicro-benchmarks, like the ones above, estimate the cost of running a small,\nindependent piece of code. Micro-benchmarking isn’t the right tool for finding\nthe hot spots in your program – for that you should use profiling tools like\nthe excellent perf suite for Linux. But it is useful for analyzing the cost of\na given operation. Micro-benchmarking is useful both for designing the bits and\npieces of your libraries, and for building up a rough mental model of the cost\nof different operations. All of this is important to do well if you’re going to\nwrite efficient code.\n\nNow that we have a taste of what the library does, let’s turn our attention how\nit works.\n\nHow Core_bench works\n\nMicro-benchmarking is harder than it sounds, for a few reasons. Let’s start by\nconsidering the following naive approach to measuring the execution time of a\nfunction f.\n\nlet t1 = Time.now () in\n  f ();\nlet t2 = Time.now () in\nreport (t2 - t1)\n\n\n\nWhen f () is a short-lived computation, the above t2 - t1 is erroneous for\ntwo reasons:\n\n1: Time.now is too imprecise. The usual resolution of time returned by\nTime.now is of the order of 1 microsecond. Consequently the execution time of\nf () gets lost in the noise.\n\n2: Time.now is an expensive function that takes a while to execute and\ntypically requires control transfer to a VDSO. (On my older laptop this takes\n800+ nanos to run. On more expensive server class machines, I have seen numbers\nas low as 40 nanos.)\n\nThe usual approach to minimizing the measurement error of an imprecise timer is\nto run f () many times. Thus the pseudo-code becomes:\n\nlet t1 = Time.now () in\nfor i = 1 to batch_size do\n  f ();\ndone;\nlet t2 = Time.now () in\nreport batch_size (t2 - t1)\n\n\n\nHere we select a batch_size such that the time taken to run the inner loop is\nsignificantly more that the errors inherent in time measurement. To compute the\nright value of batch_size, one needs to run Time.now () repeatedly to build\nan estimate of both the execution time and the precision of the timing\ncapabilities of the machine. Subsequently, one needs to run f () repeatedly to\nestimate its approximate cost. Once both the estimates are available then one\ncan select batch_size such that the errors introduced by timing is\nsignificantly smaller (say, less than 1%) of the time taken to execute the for\nloop above. Other micro-benchmarking libraries, notably Haskell’s Criterion, do\nthis.\n\nWhile the above strategy compensates for timing errors, it does not account for\nerrors that show up due to system activity. To account for noise in the system,\nwhat Criterion does is that it collects many samples at the selected\nbatch_size:\n\nfor j = 1 to samples do\n  let t1 = Time.now () in\n  for i = 1 to batch_size do\n    f ();\n  done;\n  let t2 = Time.now () in\n  report batch_size (t2 - t1)\ndone\n\n\n\nOnce the samples have been collected the mean, standard deviation and various\nother stats can be reported to the user. If there was any significant system\nactivity while the benchmark was running, the affected samples will show up as\noutliers in the data set. Criterion additionally provides several nice\nvisualizations to the user such that the user can see outliers caused by system\nnoise and such. See the following blog entry about the design of\nCriterion.\n\nSo are we done? It turns out that a straightforward implementation of the above\nin OCaml results in benchmark numbers that show a large amount of variance\nbetween runs. A major source of this variance is due to the delayed cost of GC.\nTo understand this better, consider the graph below which is a plot of execution\ntime versus batch_size:\n\n\n\nThe above graph corresponds to repeated sampling of the following f at various\nbatch sizes:\n\nlet f () = ignore(Array.create ~len:300 0)\n\n\n\nIn this graph, the y-axis shows time taken per sample and the x-axis is the\nbatch_size of the sample. The wave-like pattern that emerges here is due to\nGC. There are points along the x-axis where the execution time is clearly\nbi-modal or tri-modal, while there are points where there is very little\nvariance at all. The points along the x-axis with higher variance correspond to\nruns where the f was run enough times that it triggered either n or n+1 GCs\n(sometimes n+2) GCs. The points with lower variance correspond to batch sizes\nwhere f triggered almost the same number of GCs in each sample.\n\nThe interesting thing to note about the above graph is that GC effects have a\nperiodicity. This periodicity comes from the fact that repeated runs of f ()\nallocate memory at a fixed rate. Looking at the graph, it becomes clear that the\nexecution times of functions that allocate memory are influenced by the choice\nof batch size. If we could somehow select batch sizes that had low variance then\nwe could minimize our error estimates. But in general this is not easy since the\nshape of the above graph and regions of low variance are specific to the\nfunction being benchmarked. So how can we reduce our error estimates?\n\nHere is the important insight that leads to reducing the error estimates: If we\nchoose any one batch size we might get unlucky and get one with high error\nestimates or get lucky and pick one that gets us tight error estimates. However\nif we sample across multiple batch sizes we will tend to smooth this out.\n\nThis brings us to how Core_bench works: Core_bench runs f in increasing\nbatch sizes and reports the estimated time of f () by doing a linear\nregression. In the simplest case, the linear regression uses execution time as\nthe predicted variable and batch size as the predictor. Several useful\nproperties follow as a consequence:\n\n1: The slope of the regression line represents the time per run of f (). This\nis the single most important number we are after.\n\n2: We no longer have to figure out the batch size. We can just start small and\nincrease the batch size untill our time quota runs out. Consequently, we no\nlonger have to estimate the timer function and other such constant overheads.\nAll the constant overheads that are incurred once per sample have the effect of\nadding a constant overhead to all the samples. In other words, these constant\noverheads only contribute to the y-intercept given of the linear regression and\nnot to the slope.\n\n3: The same approach extends to estimating memory allocation, wherein we do a\nlinear regression of minor allocations, promoted words and major allocations\nagainst batch size. Similarly, we can also estimate the number of GCs per call\nto f using the same approach. For very cheap functions we would see numbers\nsuch as one minor collection for every 10k runs of f ().\n\nFurther, since rate of memory allocation and other such effects are a property\nof the semantics of the particular f we are measuring, for making programming\nchoices it is valuable to have an amortized measure of f that includes the GC\ncosts and other such runtime costs. In other words, we want to measure f\nprimarily in terms of batch_size over very large number of runs.\n\nCumulative effects like memory allocation imply that batch runtimes are not\nindependant. Expensive batches, i.e. ones where a GC has occurred, tend to be\nfollowed by relatively cheaper batches (since there is more free memory\navailable and hence less GC pressure). To try to measure larger batches for the\nsame amount of sampling time, Core_bench geometrically samples batch sizes\ninstead of linearly sampling them. The graph below is a plot of the execution\ntime versus batch size of the same function f with the same amount of total\ntime spent collecting samples as the previous graph, but with batch size\nincremented geometrically:\n\n\n\nThe resulting estimate of runtime vs batch_size includes the amortized cost of\nGC overheads. This amortized cost is valuable in evaluating performance\ntrade-offs that one might face when comparing multiple design choices of f.\nSpecifically,\n\n1: If two functions have the same nominal execution time, but one does more\nallocation than the other, then the one that allocates more will have a higher\namortized execution time reported by the regression. This will be reflected as\nhaving a higher slope in the time vs. runs graph of the functions.\n\n2: If both functions have similar nominal execution times, but one allocates n\nwords to the minor heap and the other allocates n words to the major heap, the\ngraph above will show a steeper slope for the one that allocates directly to the\nmajor heap.\n\nThis is the essence of Core_bench and the the above amortized estimate is what\nit is most commonly used for. This typically gives us a single number that helps\nus chose between two implementations. It also estimates memory allocation and\namortized number of garbage collections caused by functions and this helps us\ndiscover ways to structure code such that allocation is minimized. All of this\nhelps us build a mental model of the costs of various operations and hence lets\nus make better choices when writing performance sensitive code.\n\nEstimating GC costs\n\nA natural extension of the above approach is to use multivariate linear\nregression to explain the performance profiles of some functions. Instead of\nexplaining runtime of functions purely in terms of batch_size as a single\npredictor in the linear regression, we can split up the runtime using multiple\npredictors such as batch_size, number of GCs and other parameters.\nCore_bench has experimental support for this. This becomes interesting for\nsome functions such as the one below:\n\nlet benchmark = Bench.Test.create ~name:\"List.init\"\n (fun () -&gt; ignore(List.init 100_000 ~f:id))\n\n\n\nIf we plot increasing batch sizes vs execution time of f as we did before,\nthis function shows a very strange profile:\n\n\n\nIn some regions of this graph, the execution time taken per batch seems to\ndecrease as the the batch size increases. Paradoxically, we seem to be taking\nlesser and lesser time for more number of runs of f (). In other words, in\nsome ranges the graph has a negative slope and implying that f () has a\nnegative execution time. (It is interesting to puzzle out why this function has\nthis strange execution profile.)\n\nIt is clear that the execution time of f () responds to something other than\njust the batch_size. Doing a simple linear regression against batch size gives\nus the linear slope plotted above. While it gives us a sense of the cost of f,\nit really does not explain the time profile. Doing a multivariate linear\nregression is interesting in such cases because of its explanatory power:\nspecifically, it can break down the execution time of f () into component\ncosts per predictor.\n\nHere is a plot of the execution time of f along with the various other\nmeasures:\n\n\n\nIntuitively, one can see that the execution time might be better explained by\nlinearly combining the batch_size, the promoted words and the compactions.\nDoing linear regression with all three predictors results in a much better fit\nand a much better explanation of the runtime of f in terms of all the three\naspects of the function that contribute to its execution time.\n\n\n\nThough the current version of Core_bench can do such multivariate regressions,\nthis feature is considered experimental primarily because it is difficult for\nthe average user to guess at which predictors might be relevant to the execution\ntime of an arbitrary function. One thing we’d like to do in a future iteration\nof Core_bench is to have the library automatically search the space of possible\npredictors and report the most relevant ones. We think it would be interesting\nto expose values that come from CPU counters such as last layer cache misses and\nuse such values as inputs to such a predictor selection algorithm.\n\nConclusion\n\nIt is worth noting that there is little OCaml specific in the design and the\napproach should work for other garbage collected languages as well. It would be\ninteresting to see ports of Core_bench to other systems and see what it teaches\nus about the performance characteristics of these systems.\n\nIn Jane Street we use an accompanying syntax extension that allows us to define\nmicro-benchmarks inline in .ml files, much like we do with unit tests. Having\ninline micro-benchmarks allows us to generate performance numbers from our main\nlibraries and to track performance regression over time. Much of the work for\nthe syntax extension and related reporting infrastructure was done by Sebastian\nFunk from Cambridge University who interned in summer of 2013. At the time of\nthis writing Core_bench produces pretty stable estimates of execution costs of\nfunctions and is being used actively in development. We hope you’ll find it\nuseful in developing valuable intuition for writing performance sensitive code.\n",
        "url"      : "https://blog.janestreet.com/core_bench-micro-benchmarking-for-ocaml/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Hiring a compiler engineer",
        "date"     : "May 15, 2014",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 0,
        "content"  : "Jane Street is looking to hire an experienced compiler engineer to work on\nimproving the OCaml compiler. The focus would be on performance-related\nimprovements to the compiler and runtime system. The job would also include\nworking on other aspects of the compiler and the supporting toolchain including\nour internal development tools. We’re particularly interested in people with\nexperience in areas like optimization, GC and language runtimes, and are happy\nto consider candidates who are not (yet) OCaml experts. The position would be\nfull-time, and could be based in either London or New York.\n\nIf you’re interested (or know someone I should reach out to), please email me\ndirectly, at yminsky@janestreet.com.\n",
        "url"      : "https://blog.janestreet.com/hiring-a-compiler-engineer/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Why change is hard",
        "date"     : "May 14, 2014",
        "authorId" : "juecker",
        "author"   : null,
        "tags"     : [],
        "minsToRead" : 2,
        "content"  : "At Jane Street we have a number of systems that are vital to the operation of\nthe firm. As the company has grown, so have these systems, and in the process\nacquired a few oddities. The below is a post I wrote a while ago for our\ninternal blog to give people outside of tech an idea of why it often takes a\nlong time to add features to these systems or straighten out some of their weird\nbehaviors.\n\nImagine you’re driving your car. It’s an okay car. It drives. It may be a little\nrusty, and the engine makes some rattling sounds, but you’re not too concerned.\nActually, upon closer inspection you notice one of the tires is flat. Funny, you\nmust’ve been driving like this for years. Come to think of it, you did always\nhave to pull pretty hard on the steering wheel to avoid veering off the road…\nYou should probably do something about this.\n\nEasy, just stop and change the tire, should only take a few minutes. But here’s\nthe catch: you can never stop. Well. That’s annoying. You can probably still\nchange the tire if you spend a lot of time preparing, make a careful plan, and\nthen risk your life.\n\nAssuming you survive the tire change, there’s still that rattling noise in the\nengine. That’s obviously going to be an issue sooner or later, and no amount of\nacrobatics is going to help you change the engine whilst driving. Hm. So I lied\nbefore. You can stop the car, but only for a few minutes at a time, and you’d\nbetter be damn sure the car will start again when you need to go on. So you\ncan’t just stop at a garage and change the engine – that’ll take too long and\nwho knows whether they’ll connect all the hoses and gears and stuff1 the\nright way on the first try. You adopt the obvious solution:\n\n\n  At a quick stop, have a new engine strapped to the roof of your car\n  Connect things up so that you can switch between the old and the new\nengine2\n  At another quick stop remove the old engine\n  Yet another quick stop, move the new engine from the roof into the engine\ncompartment\n  Spend the next 6 months shortening excess hose, smoothing out dents in your\nroof, and scrubbing away oil stains in the upholstery.\n\n\nThe bottom line is that when you want to make any kind of change to a system\nthat can never be shutdown or can only be down for very short amounts of time,\nyou probably can’t (or at least shouldn’t) just make that change in one go. You\nhave to break it down into very small, well understood steps, not all of which\nmay directly contribute to what you actually want to achieve.\n\n\n  \n    \n      It should be evident at this point that I don’t know anything about cars. &#8617;\n    \n    \n      See 1. &#8617;\n    \n  \n\n",
        "url"      : "https://blog.janestreet.com/why-change-is-hard/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Better namespaces through module aliases",
        "date"     : "May 12, 2014",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 4,
        "content"  : "(OCaml 4.02 is entering a feature freeze, which makes it a good time to stop\nand take a look at what to expect for this release. This is part of a series of\nposts where I’ll describe the features that strike me as notable. This is part\n2.)\n\nOCaml has a bit of a namespace problem.\n\nIn particular OCaml has no good way of organizing modules into packages. One\nsign of the problem is that you can’t build an executable that has two modules\nwith the same module name. This is a pretty awkward restriction, and it gets\nunworkable pretty fast as your codebase gets bigger\n\nOther than just prefixing all of your module names with a package name (e.g.,\nCore_kernel_list, Core_kernel_int, Core_kernel_array, etc. It gets old\nfast.), the only solution right now is something called packed modules. OCaml\ncan pack a collection of individual module into a single synthetic “packed”\nmodule. Importantly, different packs included in the same executable are allowed\nto contain modules of the same name.\n\nIn practice, a packed module is a lot like what you’d get it you named all of\nyour modules distinctly, and then used a single module to packs together all\nyour other modules, giving them shorter and more usable names in the process.\nThus, for Core_kernel, we could name all our modules uniquely, and then\nprovide a single renaming module to allow people to use those modules\nconveniently, like this:\n\nmodule List  = Core_kernel_list\nmodule Array = Core_kernel_array\nmodule Int   = Core_kernel_int\n...\n\n\n\nAnd then user code could use these short names by opening the module:\n\nopen Core_kernel\n\nlet drop_zeros l = List.filter l ~f:(fun x -&gt; x &lt;&gt; 0)\n\n\n\nIn the above, List refers to Core_kernel’s list, not the List module that\nships with the compiler. The longer names would only show up within the\nCore_kernel package.\n\nPacked modules basically automate this process for you, with the one improvement\nthat you get to use the short names within the package your building as well as\noutside of it.\n\nWe use packed modules extensively at Jane Street, and they’ve been a real help\nin organizing our large and complex codebase. But packs turn out to be highly\nproblematic. In particular, they lead to three distinct problems.\n\n\n  slow compilation of individual files\n  large executable sizes\n  coarse dependency tracking, leading to slow incremental rebuilds.\n\n\nThe slow compilation of individual files comes from the cost of interacting with\na large module like Core_kernel. Core_kernel is large because it effectively\ncontains a full copy of every module in the Core_kernel package. That’s\nbecause a line like this:\n\nmodule List = Core_kernel_list\n\n\n\ndoesn’t simply make Core_kernel.List an alias to Core_kernel_list; it makes\na full copy of the module. Indeed, the above line is equivalent to the\nfollowing.\n\nmodule List = struct include Core_kernel_list end\n\n\n\nPacked modules also increase your executable size, since OCaml includes code at\nthe compilation unit granularity. Because packed modules are compilation\nunits, referring to even a single module of Core_kernel requires you to link\nall of Core_kernel into your executable.\n\nThe coarse dependency problem has to do with the fact that a packed module\ndepends on all the modules that are included in it, and so once you depend on\nanything in the pack, you depend on everything there. For us, that means that\nchanging a single line of the most obscure module in Core_kernel will cause us\nto have to rebuild essentially our entire tree.\n\nModule aliases, along with a few related improvements to the compiler, let us\nwork around all of these problems. In particular, in 4.02, the following\nstatement\n\nmodule List = Core_kernel_list\n\n\n\nis in fact an alias rather than a copy. This means that opening Core_kernel\nwould only introduce a bunch of aliases, which does not require a lot of work\nfrom the compiler.\n\nExecutable size will be improved because we’ll be able to move to having a\npackage be structured as a module containing a set of aliases, rather than as a\npack. That means we no longer have a single large compilation unit for the\nentire package, and so, using some improved dependency handling in the compiler,\nwe can link in only the modules that we actually use.\n\nFinally, the dependency-choke-point problem will be fixed by having a tighter\nunderstanding of dependencies. In particular, the fact that I depend on\nCore_kernel, which contains a collection of aliases to many other modules like\nCore_kernel_list or Core_kernel_array, doesn’t mean I truly depend on all\nthose modules. In particular, if I don’t use (and so don’t link in)\nCore_kernel_array, then I don’t need to recompile when `Core_kernel_array\nchanges.\n\nModule aliases have other uses, in particular having to do with changes to the\nsemantics of functors. But for us, the change to compilation speed and\nexecutable size are the big story.\n",
        "url"      : "https://blog.janestreet.com/better-namespaces-through-module-aliases/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Extension Points, or how OCaml is becoming more like Lisp",
        "date"     : "May 8, 2014",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 3,
        "content"  : "(OCaml 4.02 is entering a feature freeze, which makes it a good time to stop\nand take a look at what to expect for this release. This is the first of a few\nblog pots where I’ll describe the features that strike me as notable.)\n\nOCaml’s metaprogramming story is kind of messy.\n\nThe good news is that OCaml has an effective metaprogramming system. It’s called\ncamlp4, and before complaining about it, I want to be clear how useful it is.\nAt Jane Street, syntax extensions like sexplib, pa_compare and binprot\nhave made us more productive, allowing us to extend the language in ways that\nsave us from having to write big piles of unmaintainable boilerplate.\n\nBut camlp4 has some serious warts, which mostly derive from where it sits in\nthe OCaml pipeline. In particular, camlp4 is an alternate front-end to the\ncompiler, with its own extensible parser that allows you to extend OCaml’s\nsyntax in any way you like. There are downsides, however, that derive from\nhaving two different parsers. In particular, camlp4’s parser has its own\nslightly different behavior, its independent set of bugs, and its own excitingly\nobscure error messages.\n\ncamlp4’s separate parser might seem like a necessary evil. After all, syntax\nextensions require changing the syntax, and you obviously can’t change the\nsyntax without a new parser.\n\nOr can you? Lisp has a rich syntactic macro system but just one parser. Thus, in\nLisp, all macros are AST-to-AST transformations. Lisp’s macro system can\nimplement lots of different syntaxes within the world of s-expressions, since\ns-expressions are so general and flexible.\n\nOCaml’s syntax, on the other hand, is very specific and inflexible. It lets you\nparse OCaml’s syntax exactly, and any deviation is flagged as a syntax error.\nAlmost any interesting syntax extension will require parsing programs that are\nnot syntactically valid OCaml programs.\n\nThat’s where extension points come in. Extension points are a collection of\nextensions to OCaml’s grammar that adds a notation for annotations. With these\nannotations, OCaml’s syntax becomes general enough to accommodate many different\nsyntax extensions. Indeed, Alain Frisch, who is the main author and advocate of\nthis change, organized a survey of existing camlp4-based syntax extensions, and\nmade sure that extension points were rich enough to accommodate them.\n\nThe big advantage of this approach is that it simplifies the process of\ndeveloping the compiler (because you don’t need to maintain two independent\nimplementations of the parser) and because you only have one syntax for\ndevelopment tools to target. One of the wins we hope to get from this is that\nIDE-like tools like Merlin should be able to more easily interact with code that\nuses syntactic macros, like the codebase at Jane Street.\n\nThe downside, of course, is that to take advantage of this, you need to port\nyour existing syntax extensions to work against the (now extended) OCaml AST.\nAlso, it means that the concrete syntax that was used in existing syntax\nextensions will mostly need to change. No longer can we write\n\ntype t = int * string with sexp\n\nInstead, we’ll need to write something like:\n\ntype t = int * string [@@sexp]\n\nThat said, we expect this change to be worth the implied churn.\n\np.s., it was noted that I didn’t do a great job of showing the flexibility that\nextension points gives you. If you want to learn more about this,\nthis goes over the major use-cases for syntax\nextensions that were considered in the design of the annotation syntax, and how\nthey would be rendered in OCaml 4.02.\n",
        "url"      : "https://blog.janestreet.com/extension-points-or-how-ocaml-is-becoming-more-like-lisp/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "How to fail -- introducing Or_error.t",
        "date"     : "April 23, 2014",
        "authorId" : "dhouse",
        "author"   : "David House",
        "tags"     : [],
        "minsToRead" : 20,
        "content"  : "There are a bunch of different ways of handling errors in OCaml. If you’ve just\nstarted learning about functional programming, you’ll no doubt have come across\nthe famed option type. Here’s its definition:\n\ntype 'a option = Some of 'a | None\n\n\n\nFunctions that might fail return an option instead of the corresponding\n“unwrapped” type. For example, the function to get the first element of the list\nhas type List.hd : 'a list -&gt; 'a option – it might fail if the list is\nempty. Ask a functional programming fanatic what the best things about static\nexpressive type systems are, and they’re bound to mention the option type very\nhigh up the list. The option type makes it explicit in the type system what\nfunctions might fail. If you’re writing such a function it gives you a mechanism\nto force your caller to check your error case. All the other classic ways of\nhandling errors – returning null pointers, returning some special value (like\n-1), setting global error variables or raising exceptions – require the caller\nto remember to do a thing. With option types, the compiler checks you haven’t\nforgotten. It’s impossible to understate the significance of this. This is going\nto be a much longer blog post if I get started down this path of evangelism, so\nlet’s move on 🙂 But as you start to crank out your project, you’ll quickly\nrealise that option has its limitations. For example, consider the following\nfunction:\n\nval Hashtbl.of_alist : ('a * 'b) list -&gt; ('a, 'b) Hashtbl.t option\n\n\n\n(Eagle-eyed readers will spot that this is not how the hashtable type works in\ncore, because we eschew polymorphic comparison as much as possible. But\nthat’s another blog post.)\n\nThis function can fail in the case where you have a duplicate key. But this is a\nuser-experience disaster waiting to happen. I’m picturing a gigantic config\nfile, involving (the sexp representation of) a large hashtable with hundreds of\nkeys; you launch your program, and are presented with an error that says\n“duplicate keys”. Taking a peek under the hood:\n\nlet hashtbl =\n  match Hashtbl.of_alist alist_from_config_file with\n  | Some table -&gt; table\n  | None -&gt;\n    (* I really want to give the user a better error, but I don't have\n       enough information! *)\n    failwith \"Duplicate keys\"\nin\n\n\n\nHow tragic.\n\nThe alternatives\n\nOne of the disadvantages of having such an expressive type system is that\nthere’s so many ways to skin a cat, and you end up with inconsistent and\nincomposable code. Here are some of the ways that we solved the option problem.\n\n1. Define a new variant type that lists all the failure cases.\n\nDefining types in ocaml is cheap, right? So why not define one for each function\nthat might fail, which lists all of the ways it can fail, and use that?\n\ntype ('a, 'b) of_alist_result =\n| Ok of ('a, 'b) Hashtbl.t\n| Duplicate_keys of 'a list\n\nval Hashtbl.of_alist : ('a * 'b) list -&gt; ('a, 'b) of_alist_resultn\n\n\n\nWell, unsurprisingly, this turns out pretty lexically heavy and annoying to do.\nHowever, polymorphic variants make it cheaper by relieving the burden of naming\nthe type:\n\nval Hashtbl.of_alist\n  : ('a * 'b) list -&gt; [ `Ok of ('a, 'b) Hashtbl.t | `Duplicate_keys of 'a list ]\n\n\n\n(This also makes the client-side code more legible as they can just say\n“| `Ok table -&gt; ...” rather than “| Hashtbl.Ok table -&gt; ...“.) This has\nsome nice things going for it. Another disadvantage of option is that, if your\nfunction can fail in one of many different ways, you have no way to communicate\nthat to your caller. All you can do is return None. But here we can list out the\nways, and given them clear names, which will appear in the caller’s code. The\nmain disadvantage of this approach is that it’s not composable. You end up\nwriting code that looks like this:\n\nmatch thing_that_might_fail_1 () with\n| `Failure_case_1 -&gt; (* somehow report error *)\n| `Ok x -&gt;\n  match thing_that_might_fail_2 x with\n  | `Failure_case_2\n  | `Failure_case_3 -&gt; (* report error here too *)\n  | `Ok y -&gt;\n    match thing_that_might_fail_3 x y with\n    | `Failure_case_4 -&gt; (* same treatment *)\n    | `Ok z -&gt;\n       ...\n\n\n\nWhat I want to say here is: just do these things in order; if any of them fail,\nstop and tell me that error; at each stage I might want to use the results of\nthe previous (successful) results when computing the next. Commonly, you want to\ntreat all the errors in a similar way: maybe convert them to a string and log to\na file, or something. But you’re forced to write very verbose code. Similarly,\nyou really start to miss “generic” functions like Option.map which can operate\non the results of any potentially-failing computation and transform it in some\nway. With this approach, you have to write a new mapping function for each\nfunction that might fail! It seems we’re expressing too much in the type\nsystem here: if the types were a little less specific about the ways our\nfunctions might go wrong (and, as we said, we often don’t care about the\ndetails, as long as there’s some way of getting a string of the error out), we’d\nhave an easier time dealing with failure in general.\n\n2. Use the Result type\n\nLet’s extend the option type:\n\ntype ('a, 'b) Result.t =\n| Ok of 'a\n| Error of 'b\n\n\n\nThat is, either the calculation was successful, and we got the 'a that we\nwanted, or it failed, in which case I have a 'b to tell me some more about\nwhat went wrong. The question is: what should we use in the 'b type?\n\n2a. ('a, [ Failure_case_1 | Failure_case_2 ]) Result.t – one option is\njust to “lift” the `Ok constructor from all our individual variants into\nResult, and still use polymorphic variants to list the error cases. This makes\nit possible to use those “generic” functions like Result.map. This is pretty\ngood. However, you still end up writing the “ever-indenting” style of code as\nabove.\n\n2b. ('a, exn) Result.t – the next idea is, instead of using an explicit\nvariant, we’ll use exn. Exceptions in ocaml are actual values: you can throw\nthem around between functions, store them in tables etc. (Then there’s a\nfunction called “raise” that lets you actually throw an exception value.) All\nexceptions have type exn, which is a bit like an “open ended variant type”. If\nyou have an exn in your hands, you can match on it just as normal:\n\nmatch my_exception with\n| Not_found -&gt; ...\n| Invalid_arg -&gt; ...\n| _ -&gt; ...\n\n\n\nBut, anyone can add to this variant by saying “exception My_exception”. (So\nmatch statements on exceptions will always need to have an “underscore case”.)\nAnd there’s a generic way to convert exceptions into strings. The fact that all\nof our errors have the same error type makes it possible to use the Result monad\nto improve our verbose mess from above:\n\nlet result =\n  let open Result.Monad_infix in\n  thing_that_might_fail_1 () &gt;&gt;= fun x -&gt;\n  thing_that_might_fail_2 x &gt;&gt;= fun y -&gt;\n  thing_that_might_fail_3 x y\nin\nmatch result with\n| Ok z -&gt; (* huzzah *)\n| Error e -&gt; log_error e (* the error handling code exists only once, here *)\n\n\n\nThere are a number of small-ish problems with this type:\n\n\n  One has to define an exception constructor for each error case. This ends up\nbeing non-local to the code.\n  The code to convert an exn to a sexp is very complicated indeed, and has\ncases that can leak space.\n  Matching on exceptions, although possible, should be discouraged. There’s no\nway apart from comments for a function to indicate which exceptions it might\nthrow. So if client code begins matching on a certain exception, that\nfunction can never use a different constructor if, for example, it wants to\nadd some extra information to the error. This is why we’re stuck in core\nwith throwing “legacy” exceptions like Not_found in a few places.\nNow, ('a, exn) Result.t does not require you to match on exceptions, but\nit does at least make it possible, and we’d like to discourage it.\n\n\n2c. ('a, string) result – The idea here is: well, someone is likely to\nwant to convert this error to a string eventually, so let’s just represent\nerrors as strings. If we want to include other ocaml values to provide context\n(e.g. the list of ‘a values that were the duped keys), then we convert them to\nstrings (probably by converting to sexps and from there to strings) and build up\nthe string we want. We don’t have any of the disadvantages of the above use of\nexceptions. And of course, we can still use the Result monad. The cons here are\nquite subtle. The trouble is: sometimes, we actually don’t want to consume any\nerror that might get produced. And constructing large number of error strings\ncan be very expensive. You have to do all the conversion to sexps and strings\nupfront, regardless of whether someone wants to consume it in the end or not.\n\nIntroducing Error.t and Or_error.t\n\nThere was a clear need for unity. Having different libraries (or different\nfunctions within the same library…) using different conventions is really\nannoying – you end up doing a lot of converting between different error types,\nwhich aside from being code noise, is a needless performance penalty, and most\nimportantly, can make errors a lot harder to read. This last disadvantage is the\nso-called “tagging problem”. Failures deep down in software often need to bubble\nup a few layers before it gets written to a log or presented to the user or\ncrashes the program or whatever. All of those layers might want to add a bit\nmore context. If you are using lots of different ways of representing errors, it\nbecomes impossible to read the resulting errors: sexps containing strings are\nconverted themselves to strings, which escapes all the quote characters; if this\nprocess iterates, you can end up with snippets that look\nlike \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\"foo.ml.My_exception ..., and the error is quite\nillegible without a sed script to strip out the duplicate slashes 🙂 So, it’s\nlikely that using any of the above solutions consistently would have been better\nthan doing nothing. But, instead, we chose to define a new type, and push hard\nto get it adopted everywhere. That type is Or_error.t:\n\ntype 'a Or_error.t = ('a, Error.t) Result.t\n\n\n\nThe key here is the new type Error.t. This is, to a first approximation, a lazy\nstring. That is, we still get the advantages that tagging is easy, but we don’t\nhave to pay the up-front costs of constructing a string. We’ve done benchmarks,\nand constructing an Error.t is “pretty cheap”. Here’s how it’s done:\n\nError.create \"problem with foo\" (4, Some \"baz\") &lt;:sexp_of&lt; int * string option &gt;&gt;\n\n\n\nThat is, you give it a string, some additional values as context, and the sexp\nconverter, for which you can use the convenient quotation syntax afforded by\npa_sexp. Here are a few more handy snippets:\n\nError.create \"values were not equal\"\n  (`from_system_a thing_a, `from_system_b thing_b)\n  &lt;:sexp_of&lt; [ `from_system_a of Thing.t ] * [ `from_system_b of Thing.t ] &gt;&gt;\n (* using polymorphic variants to make things more readable *)\nError.of_string \"couldn't find a foo\" (* no additional values *)\nError.tag error \"call to my_deeper_library failed\" (* tagging *)\nError.tag_arg error \"call to my_deeper_library failed\"\n (4, Some \"baz\") &lt;:sexp_of&lt; int * string option &gt;&gt; (* tagging with arguments *)\nOr_error.error_string \"couldn't find a foo\" (* shorthand for Error (Error.of_string ...) *)\nOr_error.error \"problem with foo\" (4, Some \"baz\") &lt;:sexp_of&lt; int * string option &gt;&gt;\n  (* shorthand for Error (Error.create ...) *)\nOr_error.tag or_error \"call to my_deeper_library failed\"\n  (* shorthand for grabbing the error out, tagging it, then wrapping it up again *)\nOr_error.tag_arg error \"call to my_deeper_library failed\"\n (4, Some \"baz\") &lt;:sexp_of&lt; int * string option &gt;&gt; (* similarly *)\n\n\n\nAnd, of course, the convenient monadic syntax is possible with Or_error.t.\nAlso, opening Core.Std brings some names into scope:\n\nok_exn (* short for Or_error.ok_exn *)\nerror (* short for Or_error.error *)\n\n\n\nHaving these two available without module qualifiers emphasises our choice of\nOr_error as the default way of doing errors. In particular, having a short name\nfor ok_exn is very convenient – in the past, we’ve often defined pairs of\nfunctions in Core, one called foo that returned an option or an error, and one\ncalled foo_exn that calls Option.value_exn or Result.ok_exn on the return\nvalue. But, having ok_exn in the global scope reduces the need for this. It’s\nnot completely free, since it’s still an allocation, so beware of using it in\nyour hottest loops – you might consider statically allocating some errors, which\nhelps a lot, although it restricts you to Error.of_string rather than\nError.create.\n\nA larger example\n\nHere’s a simple configuration file for a minesweeper clone:\n\nopen Core.Std\nopen Async.Std\n\nmodule Dimensions = struct\n  type t = { width : int; height : int }\n  let area { width; height } = width * height\n  let both_nonnegative { width; height } =\n    Or_error.combine_errors_unit\n      [ if width &gt; 0 then Ok () else Or_error.error \"width &lt;= 0\" width &lt;:sexp_of&lt; int &gt;&gt;;\n        if height &gt; 0 then Ok () else Or_error.error \"height &lt;= 0\" height &lt;:sexp_of&lt; int &gt;&gt;\n      ]\nend\n\ntype t =\n  { play_grid_in_blocks : Dimensions.t;\n    block_size_in_px : Dimensions.t;\n    num_mines : int;\n    background_color : [ `White | `Black ]\n  } with sexp\n\nlet validate { play_grid_in_blocks;\n               block_size_in_px;\n               num_mines;\n               background_color = _\n             } =\n  let open Or_error.Monad_infix in\n  Dimensions.both_nonnegative play_grid_in_blocks\n  &gt;&gt;= fun () -&gt;\n  Dimensions.both_nonnegative block_size_in_px\n  &gt;&gt;= fun () -&gt;\n  begin\n    let playable_area = Dimensions.area play_grid_in_blocks in\n    if playable_area &gt;= 4 then\n      Ok playable_area\n    else\n      Or_error.error_string \"playable area must be at least 4 blocks\"\n  end\n  &gt;&gt;= fun playable_area -&gt;\n  begin\n    if num_mines &lt;= playable_area then Ok () else\n      Or_error.error \"too many mines\" (num_mines, `playable_area playable_area)\n        &lt;:sexp_of&lt; int * [ `playable_area of int ] &gt;&gt;\n  end\n  &gt;&gt;= fun () -&gt;\n  Ok t\n\nlet load file =\n  Reader.load_sexp_exn file &lt;:of_sexp&lt; t &gt;&gt;\n  &gt;&gt;= fun () -&gt;\n  Deferred.return\n    (Or_error.tag_arg (validate t) \"invalid configuration\" t &lt;:sexp_of&lt; t &gt;&gt;)\n\n\n\nPoints to make:\n\n\n  If you’re writing in async too, as is typical inside Jane Street, then you\nhave to be careful as to when the monadic infix operators (&gt;&gt;= and &gt;&gt;|)\nmean async-monad and when they mean Result-monad. I favour local opens of\none of the Monad_infix modules, like in validate above, to keep things\nclear.\n  val Or_error.combine_errors_unit : unit Or_error.t list -&gt; unit Or_error.t is\na really nice function. If you have a bunch of checks, none of which depend\non the result of any other checks, then you can stick them all in a list and\ncall this function, which will return Ok if all of them are Ok, and one\nsingle Error, representing all the errors, if not.\n  I used the record-deconstruction syntax in validate to check I was\nvalidating all the fields. (Recent versions of OCaml will give you a warning\nunless you either mention all fields or put “; _” at the end of your\npattern, and if you’re at all sensible you convert compiler warnings into\nhard errors :)) Note that there’s nothing to validate\nfor background_color, but the compiler forced me to explicitly say so.\nThis is a powerful trick.\n  Using polymorphic variants as labels, as in the validation of num_mines,\nis really nice, but you do have to repeat the name of the variant in the\nsexp converter. A price worth paying, though.\n  Tagging is really awesome. Every time one writes\n | Error _ -&gt; (* construct another error *)“, that’s a red flag that you\nshould be tagging the original error and propagating it instead. (In fact,\nevery time you say “| Error _” is probably a mistake – at least there\nshould be a comment on why you’re throwing away this error.) But, there\nis no clear contract on who should be doing the tagging, caller or callee.\nE.g. is it up to load to say “invalid configuration”, or the caller of\nload? Sadly, no clear convention has as of yet established itself.\n\n\nThe importance of consistency\n\nOne place where Or_error does not apply is: it’s occasionally crucial to be\nable to enumerate your error cases and force the caller to go through them one\nby one. This is not the common case – generally callers just want to know if\ntheir call was successful or not, and if not, be able to tell a human why not\n(i.e. convert the error to a string). But there are times when it’s worth\nspelling it out. In that case, we either use 1. or 2(a) – it doesn’t much\nmatter, because this will only be for a small minority of your code. Also, it’s\noften okay to use option. Some functions have a sufficiently small scope that\nthey can only fail in one way, and that way is obvious given the function’s\nname. For example, in Core, List.hd still returns an option – it’s clear that\nthe only error case is the empty list. Moreover the caller can probably give a\nbetter idea of what’s gone wrong: rather than the error saying “empty list”, it\ncould say something more like “item foo didn’t match any classification filters”\nor something. I don’t think it’s obvious that Error.t is the best type to use in\nall circumstances. It does seem to hit the sweet spot among all the\naforementioned options, but we probably could have been successful pushing,\ne.g., “Or_exn.t”. But, consistency is extremely important. It’s really nice\nthat you can make a function call to two libraries and sequence them together\nusing the Result monad. It can be worth using Or_error in some situations where\nit might be marginally preferable to use a different approach. We’ve found it to\nbe a big win for consistency, composability and readability of errors.\n",
        "url"      : "https://blog.janestreet.com/how-to-fail-introducing-or-error-dot-t/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Disabling Chrome's x-webkit-speech vulnerability",
        "date"     : "April 22, 2014",
        "authorId" : "rsclater",
        "author"   : "Robert Sclater",
        "tags"     : [],
        "minsToRead" : 1,
        "content"  : "It’s been a busy couple of weeks for Internet security! Almost unnoticed amongst\nthe ‘Heartbleed’ fallout was a post on Guy Aharonovsky’s\nblog\ndetailing how Google Chrome’s speech-to-text features can be used to snoop on\nanything you say near your computer — via a single tag attribute and some CSS.\n\nThe exploit, in a nutshell:\n\nA text box with the x-webkit-speech attribute lets the user click a microphone\nicon and speak text into the box. With some simple stylesheet tricks, the\nblogger shows how to hide the text box (and subsequent pop-up) so that speech\ncan be captured without the user’s knowledge.\n\nOkay, so that’s Not Good. How do we fix it?\n\nThe Chrome devs responded\nquickly (especially\nonce the proof-of-concept was made public), removing x-webkit-speech support\nfrom the upcoming Chrome v36. But that’s not due for stable release until\nmid-May — we needed something to\nprevent this method of snooping in the meantime.\n\nLuckily, Chrome has a pretty awesome Extension system, so it was near-trivial to\nbuild a proof-of-concept extension that simply removes the ‘x-webkit-speech’\nattribute from any &lt;input&gt; tag on the page — the first draft was just a\nboilerplate ‘manifest’ file and 4 lines of code, but it worked!\n\nAfter some testing the plugin was extended to listen for DOM changes (so it\ncould detect if a speech input was added to the page via Javascript).\nAdditionally, ‘page icon’ was added to give UI feedback that speech had been\ndisabled, which the user can click to re-enable speech input if desired.\n\nThe extension is available in the Chrome Web\nStore.\n",
        "url"      : "https://blog.janestreet.com/disabling-chromes-x-webkit-speech-vulnerability/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "How Does Automount Work Anyway?",
        "date"     : "April 17, 2014",
        "authorId" : "cperl",
        "author"   : "Chris Perl",
        "tags"     : [],
        "minsToRead" : 45,
        "content"  : "Introduction\n\nAutofs/automount is a combination of a user space program and some pieces in the\nkernel that work together to allow filesystems (many kinds of filesystems, but\noften NFS) to be mounted “just in time” and then be unmounted when they are no\nlonger in use. As soon as a process wants to access an automounted filesystem,\nthe kernel intercepts the access and passes control to a user space program\n(typically automount, but systemd now supports some automount functionality as\nwell).\n\nThe user space program does whatever is necessary to mount the file system\n(which usually involves some invocation of mount(8)) and then reports success or\nfailure back to the kernel so the kernel can either allow the process to\ncontinue to, or signal a failure.\n\nWe use automount for a number of things here at Jane Street. Recently, users\nstarted reporting that directories that shouldn’t exist (i.e. some path on an\nautomounted filesystem for which the automount daemon has no configuration) were\nspontaneously appearing and not going away. Most commonly, users were seeing\ndubious output from commands like hg root and hg status, both cases in which\nMercurial calls stat(2) on any path that seems like it might be a “.hg”\ndirectory. The problem was that “.hg” directories kept popping up in places\nwhere they didn’t actually exist, causing these stat(2) calls to succeed and\nMercurial to believe it had found a valid repository directory. Because attempts\nto access this ghost “.hg” directory obviously fail, Mercurial provides odd\noutput for hg root and hg status. We were stumped so I dug into automount to\ntry to find out where things were going wrong.\n\nDebugging\n\nI’m a big fan of dynamic tracing tools such as DTrace,\nSystemTap and Ktap\nfor troubleshooting problems like this. In this instance I used a SystemTap\nscript (developed as I went and presented at the end of this blog post) coupled\nwith simply browsing the available source code to better understand the\nautomount daemon’s behavior and ultimately present enough information to the\nright people to get our problem\nfixed.\n\nFor further info about using Dynamic Tracing to better understand your systems,\nI highly recommend reading Brendan Gregg’s book,\nDTrace: Dynamic Tracing in Oracle Solaris, Mac OS X and\nFreeBSD.\n\nMaps\n\nThe configuration of the automount daemon is based on maps. The automount daemon\ntakes as input the location where it can find its master map. That map can be in\na file, LDAP or anywhere else but the automount daemon needs a master map (see\nauto.master(5)).\n\nAssuming the master map is a file, each line of the master map consists of\nseveral fields. There is the mountpoint for the entry, the type (i.e. one of\n“file”, “program”, “yp” or several others), the format and then any options that\nare to be applied to this master map entry. If the master map entry specifies a\nmount point of “/-” then the map that it references is considered “direct”.\nOtherwise, it is considered “indirect.”\n\nIn addition, both direct and indirect mounts can have “offsets” (see\nautofs4-mount-control.txt,\nauto.master(5) and autofs(5) for further details).\n\nIn the end, from the view point of autofs filesystems, it’s important to\nremember that with indirect mounts the autofs filesystem is mounted on the\nmountpoint specified in the master map (or in submaps if you have multiple\nlevels of nesting) and then filesystems are automounted as subdirectories of the\nautofs filesystem. For mounts that are direct, direct with offsets and indirect\nwith offsets the autofs filesystem is mounted at the point where the automounted\nfilesystem will ultimately be mounted. As such, once that filesystem is mounted,\nthe underlying autofs filesystem has been shadowed.\n\nWe use both indirect and indirect with offset mounts here at Jane Street.\n\nUser to Kernel Communication\n\nInterfaces to the Kernel\n\nAll communication from user space to the kernel is done via ioctl(2).\n\nThe latest version of this interface, v5, uses a distinct set of ioctls from\nprior versions. The newer interface has advantages over the older interfaces, so\nit always makes sense to use the newer interface when you can.\n\nFor example, automount(8) from the autofs package will always try to use the\nnew interface and only if it runs into some kind of a problem will it fall back\nto the older interface (e.g. if access to “/dev/autofs” is denied due to\nSELinux).\n\nThis\ndocument\nhas a good description of the problem and motivation for the newer interface. If\nyou really want to understand automount/autofs, you should read that document.\n\nFor the rest of this post, I’ll focus specifically on the newer interface, which\nall new implementations should be using.\n\nThe New Interface\n\nSome of the main points of the newer interface are:\n\n\n  \n    All ioctls are issued to the “/dev/autofs” device node. Previously the user\nspace daemon needed to open the directory where the autofs filesystem was\nmounted and use that file descriptor to issue ioctls.\n  \n  \n    There is a command that allows the daemon to request that the kernel open\nthe autofs filesystem at a given path. The return value of this command is\nthe file descriptor that the daemon should use for all subsequent requests\npertaining to that filesystem. This allows you to access an autofs mount\npoint even if that mount point is currently shadowed by an automounted\nmount. This can happen with direct maps and any maps that use offsets if the\nautomount daemon was restarted while automounted filesystems are still in\nuse.\n  \n  \n    There are 14 different types of requests (or commands) that can be sent from\nthe user space daemon down to the kernel:\n\n    enum {\n  /* Get various version info */\n  AUTOFS_DEV_IOCTL_VERSION_CMD = 0x71,\n  AUTOFS_DEV_IOCTL_PROTOVER_CMD,\n  AUTOFS_DEV_IOCTL_PROTOSUBVER_CMD,\n\n  /* Open mount ioctl fd */\n  AUTOFS_DEV_IOCTL_OPENMOUNT_CMD,\n\n  /* Close mount ioctl fd */\n  AUTOFS_DEV_IOCTL_CLOSEMOUNT_CMD,\n\n  /* Mount/expire status returns */\n  AUTOFS_DEV_IOCTL_READY_CMD,\n  AUTOFS_DEV_IOCTL_FAIL_CMD,\n\n  /* Activate/deactivate autofs mount */\n  AUTOFS_DEV_IOCTL_SETPIPEFD_CMD,\n  AUTOFS_DEV_IOCTL_CATATONIC_CMD,\n\n  /* Expiry timeout */\n  AUTOFS_DEV_IOCTL_TIMEOUT_CMD,\n\n  /* Get mount last requesting uid and gid */\n  AUTOFS_DEV_IOCTL_REQUESTER_CMD,\n\n  /* Check for eligible expire candidates */\n  AUTOFS_DEV_IOCTL_EXPIRE_CMD,\n\n  /* Request busy status */\n  AUTOFS_DEV_IOCTL_ASKUMOUNT_CMD,\n\n  /* Check if path is a mountpoint */\n  AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD,\n};\n\n    \n  \n  \n    The parameter that is passed with each of the possible ioctl requests is\nalways the following structure (filled in and accessed with different member\nfields depending on the ioctl request):\n\n    struct autofs_dev_ioctl {\n  __u32 ver_major;\n  __u32 ver_minor;\n  __u32 size;       /* total size of data passed in, including this struct */\n  __s32 ioctlfd;    /* automount command fd */\n\n  /* Command parameters */\n\n  union {\n    struct args_protover     protover;\n    struct args_protosubver  protosubver;\n    struct args_openmount    openmount;\n    struct args_ready        ready;\n    struct args_fail         fail;\n    struct args_setpipefd    setpipefd;\n    struct args_timeout      timeout;\n    struct args_requester    requester;\n    struct args_expire       expire;\n    struct args_askumount    askumount;\n    struct args_ismountpoint ismountpoint;\n  };\n\n  char path[0];\n};\n\n    \n  \n\n\nThe size field encodes the total size of the parameter (i.e. the total size of\nthe current struct autofs_dev_ioctl being passed). The ioctlfd field encodes\nwhat autofs filesystem this ioctl is meant to operate on and the rest are ioctl\nspecific arguments.\n\nEach of the structs referenced in the union type provide field names (and data\ntypes) that allow access to the particular arguments that were passed in. I\nwon’t go through each of them, but as an example, here is the definition of\nstruct args_requester:\n\nstruct args_requester {\n  __u32   uid;\n  __u32   gid;\n};\n\n\n\nThis allows the kernel to write code like param-&gt;requester.uid to access the\nuid field in the parameter to the AUTOFS_DEV_IOCTL_REQUESTER_CMD ioctl call.\n\nThe Kernel Ioctl Dispatcher\n\nWhen the autofs4 kernel module is loaded (yes, it’s the autofs4 kernel module\nthat supports the version 5 protocol), it creates the /dev/autofs device node\nand registers the proper components such that ioctls issued from user space to a\nfile descriptor associated with /dev/autofs will wind up invoking the kernel\nfunction autofs_dev_ioctl.\n\nThat function uses a dispatch table to map the ioctl request it receives to a\ndedicated function in the kernel:\n\nstatic ioctl_fn lookup_dev_ioctl(unsigned int cmd)\n{\n  static struct {\n    int cmd;\n    ioctl_fn fn;\n  } _ioctls[] = {\n    {cmd_idx(AUTOFS_DEV_IOCTL_VERSION_CMD),      NULL},\n    {cmd_idx(AUTOFS_DEV_IOCTL_PROTOVER_CMD),     autofs_dev_ioctl_protover},\n    {cmd_idx(AUTOFS_DEV_IOCTL_PROTOSUBVER_CMD),  autofs_dev_ioctl_protosubver},\n    {cmd_idx(AUTOFS_DEV_IOCTL_OPENMOUNT_CMD),    autofs_dev_ioctl_openmount},\n    {cmd_idx(AUTOFS_DEV_IOCTL_CLOSEMOUNT_CMD),   autofs_dev_ioctl_closemount},\n    {cmd_idx(AUTOFS_DEV_IOCTL_READY_CMD),        autofs_dev_ioctl_ready},\n    {cmd_idx(AUTOFS_DEV_IOCTL_FAIL_CMD),         autofs_dev_ioctl_fail},\n    {cmd_idx(AUTOFS_DEV_IOCTL_SETPIPEFD_CMD),    autofs_dev_ioctl_setpipefd},\n    {cmd_idx(AUTOFS_DEV_IOCTL_CATATONIC_CMD),    autofs_dev_ioctl_catatonic},\n    {cmd_idx(AUTOFS_DEV_IOCTL_TIMEOUT_CMD),      autofs_dev_ioctl_timeout},\n    {cmd_idx(AUTOFS_DEV_IOCTL_REQUESTER_CMD),    autofs_dev_ioctl_requester},\n    {cmd_idx(AUTOFS_DEV_IOCTL_EXPIRE_CMD),       autofs_dev_ioctl_expire},\n    {cmd_idx(AUTOFS_DEV_IOCTL_ASKUMOUNT_CMD),    autofs_dev_ioctl_askumount},\n    {cmd_idx(AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD), autofs_dev_ioctl_ismountpoint}\n  };\n  unsigned int idx = cmd_idx(cmd);\n\n  return (idx &gt;= ARRAY_SIZE(_ioctls)) ? NULL : _ioctls[idx].fn;\n}\n\n\n\nOne point worth noting here is that AUTOFS_DEV_IOCTL_VERSION_CMD doesn’t\nactually do very much (i.e. its function pointer is NULL).\n\nThere is logic in the kernel to validate that the major and minor fields\npassed in the autofs_dev_ioctl with the ioctl request are valid and then\nupdate the major and minor fields with the kernel’s values. However, this\nvalidation happens for all ioctls, so a AUTOFS_DEV_IOCTL_VERSION_CMD really is\na nop.\n\nIn practice, its invocation looks like:\n\n1396615913363043073:  automount(18060)(0xffff8803d19f8aa0) -&gt; autofs_dev_ioctl: AUTOFS_DEV_IOCTL_VERSION_CMD: major: 1, minor 0, size: 24, ioctlfd: -1\n1396615913363055840:  automount(18060)(0xffff8803d19f8aa0) &lt;- autofs_dev_ioctl: AUTOFS_DEV_IOCTL_VERSION_CMD: major: 1, minor 0, size: 24, ioctlfd: -1\n\n\n\nNote: The large numbers at the beginning of the line are nanosecond\ntimestamps. All of the examples showing tracing of various parts of\nautofs/automount look similar to the above.\n\nKernel to User Communication\n\nEstablishing A Communication Channel\n\nKernel to user space communication can happen in a few different ways.\n\nFirst, the kernel can communicate with user space via the return value from\nioctl(2). This is the least interesting communication channel.\n\nSecond, the kernel can (and does) modify the data in the autofs_dev_ioctl\npassed to it. Once the ioctl call completes, the daemon can access whatever\noutput parameters should be available in the autofs_dev_ioctl (which depends\non the specific ioctl that was requested).\n\nThird, notification of missing mounts or the need to umount existing mounts\nhappens via a regular unix pipe(2). The daemon creates a pipe just before it\nmounts a given autofs filesystem and passes the write end of that pipe as the\n‘fd’ parameter to the mount command:\n\n1396451770814396287:  automount(18388)(0xffff8803d5ae0aa0) -&gt; mount: dev: /etc/auto.foo, dir: /foo, type: autofs, data: fd=7,pgrp=18388,minproto=5,maxproto=5,indirect\n\n\n\nIn addtion, it passes the process group id of its process to the kernel so when\nthe kernel receives VFS requests for the autofs filesystems, it can determine\nwhether it should pass the requests to the automount daemon or should operate in\n“Oz mode” and let the process “see the man behind the curtain.”\n\nI didn’t make that up, it’s in the kernel sources and it’s really called\noz_mode.\n\nIn the case where you’re mounting a new autofs filesystem, giving the kernel the\npipe to use for communication is fairly straightforward. But, what about the\ncase where the daemon has just been restarted and the autofs filesystems that\nyou wish to operate on are actually shadowed by other (say NFS) filesystems?\n\nThe AUTOFS_DEV_IOCTL_OPENMOUNT_CMD ioctl was created to address this problem.\nThis ioctl’s parameter includes a path (as part of struct autofs_dev_ioctl\ndescribed earlier). The return from this call will have ioctlfd field set to\n-1 for the call, since we’re asking for the mount point to be opened.\n\nOnce you have an ioctlfd with which to operate, you can create a pipe and issue\na AUTOFS_DEV_IOCTL_SETPIPEFD_CMD ioctl which has the same effect as passing\nthe fd argument to mount(8). However, there is the additional restriction\nthat you cannot issue a AUTOFS_DEV_IOCTL_SETPIPEFD_CMD against an autofs\nfilesystem without first having set that autofs filesystem to be “catatonic”\nwith AUTOFS_DEV_IOCTL_CATATONIC_CMD.\n\nSee\nautofs4-mount-control.txt\nfor additional details.\n\nMessage Format\n\nMessages from the kernel to the user space daemon via the pipe have the\nfollowing format:\n\nstruct autofs_v5_packet {\n  struct autofs_packet_hdr hdr;\n  autofs_wqt_t wait_queue_token;\n  __u32 dev;\n  __u64 ino;\n  __u32 uid;\n  __u32 gid;\n  __u32 pid;\n  __u32 tgid;\n  __u32 len;\n  char name[NAME_MAX+1];\n};\n\n\n\nNote: This structure has some interesting history associated with it. The\nstructure is 300 bytes, but due to alignment issues, on x86_64, the compiler\npads it to 304.\n\nThis isn’t a problem if both the kernel and the user space daemon are both 32\nbit or both 64 bit, but becomes a problem when the daemon is 32 bit, but the\nkernel is 64 bit. In that case, the daemon is expecting a 300 byte packet from\nthe kernel but will receive a 304 byte packet.\n\nYou can read more about this problem\nin this lwn.net article.\n\nMessage Types\n\nThere are only 4 types of messages that are sent from the kernel to the user\nspace daemon in the v5 protocol. From the automount source:\n\nstatic int handle_packet(struct autofs_point *ap)\n{\n  union autofs_v5_packet_union pkt;\n\n  if (get_pkt(ap, &pkt))\n    return -1;\n\n  debug(ap-&gt;logopt, \"type = %d\", pkt.hdr.type);\n\n  switch (pkt.hdr.type) {\n  case autofs_ptype_missing_indirect:\n    return handle_packet_missing_indirect(ap, &pkt.v5_packet);\n\n  case autofs_ptype_missing_direct:\n    return handle_packet_missing_direct(ap, &pkt.v5_packet);\n\n  case autofs_ptype_expire_indirect:\n    return handle_packet_expire_indirect(ap, &pkt.v5_packet);\n\n  case autofs_ptype_expire_direct:\n    return handle_packet_expire_direct(ap, &pkt.v5_packet);\n  }\n  error(ap-&gt;logopt, \"unknown packet type %d\", pkt.hdr.type);\n  return -1;\n}\n\n\n\nWhen the user space daemon receives one of these messages, it knows what autofs\nfilesystem it is for because the message arrives on the pipe that the daemon\nexplicitly created for this communication. As a result, the packet itself\ndoesn’t need to identify what filesystem it is associated with.\n\nHowever, one very important piece of information that comes through the pipe is\nthe wait_queue_token. This is used by the user space daemon in its ioctl back\nto the kernel to notify the kernel whether or not it was able to process the\nrequest (via AUTOFS_DEV_IOCTL_READY_CMD or AUTOFS_DEV_IOCTL_FAIL_CMD) and\nuniquely identifies the associated request.\n\nMessages of type autofs_ptype_missing_direct and\nautofs_ptype_missing_indirect are sent when a process is trying to access a\ndirectory entry and the kernel needs the userspace daemon to resolve it.\n\nMessages of type autofs_ptype_expire_direct and autofs_ptype_expire_indirect\nare sent when the kernel has realized that a mount is no longer active and that\nit can be umounted, possibly in response to a AUTOFS_DEV_IOCTL_EXPIRE_CMD\ncommand from the daemon. That last case would mean the daemon makes a\nsynchronous ioctl call into the kernel, the kernel roots around for a while,\nfigures out something can be unmounted, then sends a message back to the daemon,\nwhich must be handled concurrently.\n\nFor example, here is automount reqeuesting an immediate expiration of /a/b/c,\nwhich causes (1) the kernel to call back into the daemon, (2) the daemon to\nperform the umount and signal success (or failure) via another ioctl, and\nfinally (3) the orginal ioctl to complete.\n\n1396454302262508340:   automount(18388)(0xffff8803d3cda040) -&gt; autofs_dev_ioctl: AUTOFS_DEV_IOCTL_EXPIRE_CMD: major: 1, minor 0, size: 24, ioctlfd: 307: how: AUTOFS_EXP_IMMEDIATE (/a/b/c)\n1396454302262523253:   automount(18388)(0xffff8803d3cda040) -&gt; autofs4_notify_daemon: autofs_ptype_expire_direct: pipefd: 48, proto: 5: {.hdr={.proto_version=5, .type=6}, .wait_queue_token=13052, .dev=1048629, .ino=9285341, .uid=0, .gid=0, .pid=486, .tgid=18388, .len=16, .name=\"ffff8800488dcbc0\"}\n1396454302262714273:   automount(18388)(0xffff8803d6cd0080) -&gt; autofs_dev_ioctl: AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD: major: 1, minor 0, size: 57, ioctlfd: -1: type: 0, path: /a/b/c\n1396454302262728637:   automount(18388)(0xffff8803d6cd0080) &lt;- autofs_dev_ioctl: AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD: major: 1, minor 0, size: 57, ioctlfd: -1: devid: 38, magic: 0x6969\n1396454302289469919:   umount.nfs( 489)(0xffff8803d64b4aa0) -&gt; umount: name: /a/b/c\n1396454302289942712:   automount(18388)(0xffff8803d6cd0080) -&gt; autofs_dev_ioctl: AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD: major: 1, minor 0, size: 57, ioctlfd: -1: type: 0, path: /a/b/c\n1396454302289956871:   automount(18388)(0xffff8803d6cd0080) &lt;- autofs_dev_ioctl: AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD: major: 1, minor 0, size: 57, ioctlfd: -1: devid: 1048629, magic: 0x187\n1396454302289984247:   automount(18388)(0xffff8803d6cd0080) -&gt; autofs_dev_ioctl: AUTOFS_DEV_IOCTL_READY_CMD: major: 1, minor 0, size: 24, ioctlfd: 307: token: 13052 (/a/b/c)\n1396454302290005575:   automount(18388)(0xffff8803d6cd0080) &lt;- autofs_dev_ioctl: AUTOFS_DEV_IOCTL_READY_CMD: major: 1, minor 0, size: 24, ioctlfd: 307: token: 13052 (/a/b/c)\n1396454302290072586:   automount(18388)(0xffff8803d3cda040) &lt;- autofs_dev_ioctl: AUTOFS_DEV_IOCTL_EXPIRE_CMD: major: 1, minor 0, size: 24, ioctlfd: 307: how: AUTOFS_EXP_IMMEDIATE ()\n\n\n\nNote: In the above output the path in parenthesis is not actually a part of\nthe ioctl. As part of the SystemTap script used to trace this I am taking the\ngiven ioctlfd and looking it up in the open file descriptors for the automount\nprocess and translating it into a path. This made it easier to match up various\ncalls that automount daemon was making to better understand its behavior.\n\nThe User Space Daemon\n\nAt startup the user space daemon will do some initial sanity checking. To do\nthis it mounts a few temporary autofs filesystems and issues ioctls to them to\nensure things are as it expects them to be (version numbers match up etc.).\n\nThe startup steps look like the following:\n\nautomount(29803)(0xffff8803d6cd1540) -&gt; autofs_dev_ioctl: AUTOFS_DEV_IOCTL_VERSION_CMD: major: 1, minor 0, size: 24, ioctlfd: -1\nautomount(29803)(0xffff8803d6cd1540) &lt;- autofs_dev_ioctl: AUTOFS_DEV_IOCTL_VERSION_CMD: major: 1, minor 0, size: 24, ioctlfd: -1\nautomount(29803)(0xffff8803d6cd1540) -&gt; mount: dev: automount, dir: /tmp/autoQzUuVP, type: autofs, data: fd=5,pgrp=29803,minproto=3,maxproto=5\nautomount(29803)(0xffff8803d6cd1540) -&gt; autofs4_fill_super: s=0xffff8803d40b8000 data=0xffff88034ce14000 silent=0x0\nautomount(29803)(0xffff8803d6cd1540) &lt;- autofs4_fill_super: {.magic=?, .pipefd=?, .pipe=?, .oz_pgrp=?, .catatonic=?, .version=?, .sub_version=?, .min_proto=?, .max_proto=?, .exp_timeout=?, .type=?, .reghost_enabled=?, .needs_reghost=?, .sb=?, .wq_mutex={...}, .pipe_mutex={...}, .fs_lock={...}, .queues=?, .lookup_lock={...}, .active_list={...}, .expiring_list={...}}\nautomount(29803)(0xffff8803d6cd1540) &lt;- mount\nautomount(29803)(0xffff8803d6cd1540) -&gt; autofs_dev_ioctl: AUTOFS_DEV_IOCTL_OPENMOUNT_CMD: major: 1, minor 0, size: 40, ioctlfd: -1: devid: 23, path: /tmp/autoQzUuVP\nautomount(29803)(0xffff8803d6cd1540) &lt;- autofs_dev_ioctl: AUTOFS_DEV_IOCTL_OPENMOUNT_CMD: major: 1, minor 0, size: 40, ioctlfd: 5: devid: 23, path: /tmp/autoQzUuVP\nautomount(29803)(0xffff8803d6cd1540) -&gt; autofs_dev_ioctl: AUTOFS_DEV_IOCTL_CATATONIC_CMD: major: 1, minor 0, size: 24, ioctlfd: 5\nautomount(29803)(0xffff8803d6cd1540) &lt;- autofs_dev_ioctl: AUTOFS_DEV_IOCTL_CATATONIC_CMD: major: 1, minor 0, size: 24, ioctlfd: 5\nautomount(29803)(0xffff8803d6cd1540) -&gt; autofs_dev_ioctl: AUTOFS_DEV_IOCTL_PROTOVER_CMD: major: 1, minor 0, size: 24, ioctlfd: 5: 0\nautomount(29803)(0xffff8803d6cd1540) &lt;- autofs_dev_ioctl: AUTOFS_DEV_IOCTL_PROTOVER_CMD: major: 1, minor 0, size: 24, ioctlfd: 5: 5\nautomount(29803)(0xffff8803d6cd1540) -&gt; autofs_dev_ioctl: AUTOFS_DEV_IOCTL_PROTOSUBVER_CMD: major: 1, minor 0, size: 24, ioctlfd: 5: 0\nautomount(29803)(0xffff8803d6cd1540) &lt;- autofs_dev_ioctl: AUTOFS_DEV_IOCTL_PROTOSUBVER_CMD: major: 1, minor 0, size: 24, ioctlfd: 5: 2\nautomount(29803)(0xffff8803d6cd1540) -&gt; autofs_dev_ioctl: AUTOFS_DEV_IOCTL_CLOSEMOUNT_CMD: major: 1, minor 0, size: 24, ioctlfd: 5 (/tmp/autoQzUuVP)\nautomount(29803)(0xffff8803d6cd1540)  umount: name: /tmp/autoQzUuVP\n    mount(29807)(0xffff8803d24d2080) -&gt; mount: dev: /tmp/autoEXHYpt, dir: /tmp/auto4X1sU6, type: none, data: \n    mount(29807)(0xffff8803d24d2080) &lt;- mount\n   umount(29808)(0xffff8803d269e040) -&gt; umount: name: /tmp/auto4X1sU6\n    mount(29810)(0xffff8803d81aeae0) -&gt; mount: dev: /tmp/autoQgrTpK, dir: /tmp/autoWZ7jVn, type: none, data: \n    mount(29810)(0xffff8803d81aeae0) &lt;- mount\n   umount(29811)(0xffff8803d56d4080) -&gt; umount: name: /tmp/autoWZ7jVn\n\n\n\nNext, the daemon consults its maps, sets up the necessary autofs filesystems and\nwaits on the pipes it has setup for communication from the kernel.\n\nWhat do the various Ioctls Do?\n\nA lot of this detail is described in\nautofs4-mount-control.txt,\nbut I’ve gone through each one to show examples of their actual invocation as\nobserved with SystemTap. If something I’ve written here conflicts with the\ndocument linked, I’m probably wrong.\n\nAUTOFS_DEV_IOCTL_VERSION_CMD\n\nAs mentioned earlier, this ioctl is basically a no-op. It does some validation\non the major and minor version numbers that you pass in the parameter (All\nioctls on /dev/autofs do this same validation).\n\nSpecifically it validates that the major number is equal to\nAUTOFS_DEV_IOCTL_VERSION_MAJOR and that the minor number is less than or equal\nto AUTOFS_DEV_IOCTL_VERSION_MINOR:\n\n#define AUTOFS_DEV_IOCTL_VERSION_MAJOR    1\n#define AUTOFS_DEV_IOCTL_VERSION_MINOR    0\n\n\n\nThese same major and minor numbers are included in all ioctls from user\nspace to kernel space.\n\nIt’s also worth noting that the ioctlfd in this request is simply set to -1.\n\n1396617872001439657:  automount(30790)(0xffff8803d5ae0aa0) -&gt; autofs_dev_ioctl: AUTOFS_DEV_IOCTL_VERSION_CMD: major: 1, minor 0, size: 24, ioctlfd: -1\n1396617872001453752:  automount(30790)(0xffff8803d5ae0aa0) &lt;- autofs_dev_ioctl: AUTOFS_DEV_IOCTL_VERSION_CMD: major: 1, minor 0, size: 24, ioctlfd: -1\n\n\n\nAUTOFS_DEV_IOCTL_PROTOVER_CMD\n\nThis ioctl is issued on a per autofs filesystem basis and therefore requires a\nvalid ioctlfd argument to identify the autofs file system in question.\n\nThe parameter to the ioctl contains a struct args_protover which is initially\n\n  On return from the ioctl, the kernel will have filled out this struct with\nthe version field of the struct autofs_sb_info associated with this autofs\nfilesystem in the kernel.\n\n\nIn practice, this call is only made once on startup for a temporary test autofs\nfilesystem and is never used again after that.\n\n1396617872001615443:  automount(30790)(0xffff8803d5ae0aa0) -&gt; autofs_dev_ioctl: AUTOFS_DEV_IOCTL_OPENMOUNT_CMD: major: 1, minor 0, size: 40, ioctlfd: -1: devid: 21, path: /tmp/autoV2cNiF\n1396617872001623807:  automount(30790)(0xffff8803d5ae0aa0) &lt;- autofs_dev_ioctl: AUTOFS_DEV_IOCTL_OPENMOUNT_CMD: major: 1, minor 0, size: 40, ioctlfd: 5: devid: 21, path: /tmp/autoV2cNiF\n1396617872001629141:  automount(30790)(0xffff8803d5ae0aa0) -&gt; autofs_dev_ioctl: AUTOFS_DEV_IOCTL_CATATONIC_CMD: major: 1, minor 0, size: 24, ioctlfd: 5\n1396617872001633477:  automount(30790)(0xffff8803d5ae0aa0) &lt;- autofs_dev_ioctl: AUTOFS_DEV_IOCTL_CATATONIC_CMD: major: 1, minor 0, size: 24, ioctlfd: 5\n\n(These prior commands are run to open a file descriptor for the mount point after mounting)\n\n1396617872001636885:  automount(30790)(0xffff8803d5ae0aa0) -&gt; autofs_dev_ioctl: AUTOFS_DEV_IOCTL_PROTOVER_CMD: major: 1, minor 0, size: 24, ioctlfd: 5: 0\n1396617872001648323:  automount(30790)(0xffff8803d5ae0aa0) &lt;- autofs_dev_ioctl: AUTOFS_DEV_IOCTL_PROTOVER_CMD: major: 1, minor 0, size: 24, ioctlfd: 5: 5\n\n\n\nAUTOFS_DEV_IOCTL_PROTOSUBVER_CMD\n\nThis ioctl is issued on a per autofs filesystem basis and therefore requires a\nvalid ioctlfd argument to identify the autofs file system in question.\n\nThe parameter to the ioctl contains a struct args_protosubver which is\ninitially 0. On return from the ioctl, the kernel will have filled out this\nstruct with the sub_version field of struct autofs_sb_info associated with\nthis autofs filesystem in the kernel.\n\nIn practice, this call is only made once on startup for a temporary test autofs\nfilesystem and is never used again after that.\n\n1396617872001651494:  automount(30790)(0xffff8803d5ae0aa0) -&gt; autofs_dev_ioctl: AUTOFS_DEV_IOCTL_PROTOSUBVER_CMD: major: 1, minor 0, size: 24, ioctlfd: 5: 0\n1396617872001654267:  automount(30790)(0xffff8803d5ae0aa0) &lt;- autofs_dev_ioctl: AUTOFS_DEV_IOCTL_PROTOSUBVER_CMD: major: 1, minor 0, size: 24, ioctlfd: 5: 2\n\n\n\nAUTOFS_DEV_IOCTL_OPENMOUNT_CMD\n\nThis ioctl is issued after an autofs filesystem is mounted in order to obtain\nthe ioctlfd that can then be used for subsequent ioctls referencing that\nautofs file system. It’s purpose is to obtain the ioctlfd and therefore you do\nnot need to have a valid ioctlfd in order to issue this ioctl.\n\nThis actually opens a new file descriptor in the process for the directory which\nis the mount point of the autofs file system.\n\nThe parameter to the ioctl contains a struct args_openmount. In addition, the\nparameter contains the actual path of the autofs filesystem that you are trying\nto open as a string.\n\nIn practice this ioctl is issued once for each autofs filesystem that the user\nspace daemon is managing.\n\n1396879456563815509:  automount(10681)(0xffff8803d82ce040) -&gt; autofs_dev_ioctl: AUTOFS_DEV_IOCTL_OPENMOUNT_CMD: major: 1, minor 0, size: 31, ioctlfd: -1: devid: 24, path: /a/b/c\n1396879456563823041:  automount(10681)(0xffff8803d82ce040) &lt;- autofs_dev_ioctl: AUTOFS_DEV_IOCTL_OPENMOUNT_CMD: major: 1, minor 0, size: 31, ioctlfd: 28: devid: 24, path: /a/b/c\n\n[cperl@tot-qws-u12114d ~]$ sudo lsof -p $(pgrep -f /etc/auto.master) | grep /a/b/c\nautomount 10681 root   28r   DIR   0,24        0 16335017 /a/b/c\n\n\n\nAs you can see, the ioctlfd field is -1 on entry to the ioctl and is filled\nout by the kernel. Furthermore, we can confirm with lsof that the file\ndescriptor is now open for the directory.\n\nAUTOFS_DEV_IOCTL_CLOSEMOUNT_CMD\n\nThis ioctl is issued to signal that the user space daemon no longer needs to\nissue ioctls for a given autofs filesystem.\n\nFor this ioctl, there are no additional details passed in a structure. The only\npiece of information needed is the ioctlfd that the daemon is requesting to be\nclosed.\n\nThis actually closes the file descriptor in the process.\n\nIn practice, this ioctl is used to close down the temporary autofs filesystem\nthat is used during early initialization to check the various version\ninformation. Otherwise it only appears to be used when shutting down automount.\n\n1396617872001657498:  automount(30790)(0xffff8803d5ae0aa0) -&gt; autofs_dev_ioctl: AUTOFS_DEV_IOCTL_CLOSEMOUNT_CMD: major: 1, minor 0, size: 24, ioctlfd: 5 (/tmp/autoV2cNiF)\n1396617872001663579:  automount(30790)(0xffff8803d5ae0aa0) &lt;- autofs_dev_ioctl: AUTOFS_DEV_IOCTL_CLOSEMOUNT_CMD: major: 1, minor 0, size: 24, ioctlfd: 5 ()\n\n\n\nAUTOFS_DEV_IOCTL_READY_CMD\n\nThis ioctl is one of two ioctls issued in response to a request from the kernel.\nThis particular ioctl signals that the request processing was successful and the\nkernel can continue.\n\nThe parameter to the ioctl contains a struct args_ready. This contains the\nsame token that was received from the kernel on the pipe where the request\noriginated. This is how the kernel can match a response to its initial request\n(along with the fact that the ioctlfd means its for a particular autofs\nfilesystem).\n\nIn practice, this ioctl is used all the time. Every time the user space daemon\nsuccessfully handles automounting a filesystem, it communicates it to the kernel\nvia this ioctl.\n\n1396617916437019856:         hg(30925)(0xffff8803d18c4aa0) -&gt; autofs4_notify_daemon: autofs_ptype_missing_indirect: pipefd: 7, proto: 5: {.hdr={.proto_version=5, .type=3}, .wait_queue_token=52225, .dev=21, .ino=11251159, .uid=12114, .gid=32771, .pid=30925, .tgid=30925, .len=6, .name=\".hg\"}\n...\n1396617916440211973:  automount(30790)(0xffff8803d11db500) -&gt; autofs_dev_ioctl: AUTOFS_DEV_IOCTL_READY_CMD: major: 1, minor 0, size: 24, ioctlfd: 11: token: 52225 (/foo)\n1396617916440230486:  automount(30790)(0xffff8803d11db500) &lt;- autofs_dev_ioctl: AUTOFS_DEV_IOCTL_READY_CMD: major: 1, minor 0, size: 24, ioctlfd: 11: token: 52225 (/foo)\n\n\n\nIn the above output I’ve included the request from hg that caused the request\nto be sent to the automount daemon (the kernel function which does that\nnotification is autofs4_notify_daemon). Here you can see the matching of\nwait_queue_token in the request and token in the response.\n\nAUTOFS_DEV_IOCTL_FAIL_CMD\n\nThis ioctl is the second of the two ioctls issued in response to a request from\nthe kernel. This command is similar to the ready command described above, but\nindicates failure.\n\nThe parameter to the ioctl contains a struct args_fail which contains both the\ntoken that the daemon received from the kernel as well as a status code.\n\nThis status code is what the kernel should return to the process that requested\nthe access. Many times this will be ENOENT, but it can also be other things like\nENOMEM or ENAMETOOLONG.\n\nIts worth pointing out that the kernel uses negative numbers to communicate\nerrors and therefore the status returned is -2 or -ENOENT. For instance, when\nthe automounter calls get_ioctl_ops()-&gt;send_fail, the argument that it passes\nis always something like -ENOENT, or -ENOMEM or -ENAMETOOLONG.\n\n1396617921296295641:       stat(30998)(0xffff8803d191c040) -&gt; autofs4_notify_daemon: autofs_ptype_missing_indirect: pipefd: 7, proto: 5: {.hdr={.proto_version=5, .type=3}, .wait_queue_token=52228, .dev=21, .ino=11251159, .uid=12114, .gid=32771, .pid=30998, .tgid=30998, .len=3, .name=\".hg\"}\n...\n1396617921296848457:  automount(30790)(0xffff8803d19f8040) -&gt; autofs_dev_ioctl: AUTOFS_DEV_IOCTL_FAIL_CMD: major: 1, minor 0, size: 24, ioctlfd: 11: token: 52228, status: -2 (/foo)\n1396617921296870360:  automount(30790)(0xffff8803d19f8040) &lt;- autofs_dev_ioctl: AUTOFS_DEV_IOCTL_FAIL_CMD: major: 1, minor 0, size: 24, ioctlfd: 11: token: 52228, status: -2 (/foo)\n\n\n\nAs you can see here, the status code for the failure is -ENOENT.\n\nAUTOFS_DEV_IOCTL_SETPIPEFD_CMD\n\nThis ioctl can be used to set the pipe file descriptor that the kernel will use\nto send notifications to the user space daemon. The other option would be to\npass the file descriptor during the actual call to mount(8). This ioctl only\nmakes sense when you already have a valid ioctlfd to issue it against.\n\nIf the automount daemon was restarted and some of its autofs filesystems that\nits supposed to be managing are shadowed by the real mounts (and therefore still\nmounted), there is no way to establish this pipe file descriptor. This is the\npurpose of this ioctl.\n\n1396617872058414208:  automount(30790)(0xffff8803d67cd500) -&gt; autofs_dev_ioctl: AUTOFS_DEV_IOCTL_SETPIPEFD_CMD: major: 1, minor 0, size: 24, ioctlfd: 17: pipefd: 13 (/home)\n1396617872058417490:  automount(30790)(0xffff8803d67cd500) &lt;- autofs_dev_ioctl: AUTOFS_DEV_IOCTL_SETPIPEFD_CMD: major: 1, minor 0, size: 24, ioctlfd: 17: pipefd: 13 (/home)\n\n\n\nAdditional data from\nautofs4-mount-control.txt:\n\n\n  The call requires an initialized struct autofs_dev_ioctl with the ioctlfd\nfield set to the descriptor obtained from the open call and the arg1 field set\nto descriptor of the pipe. On success the call also sets the process group id\nused to identify the controlling process (eg. the owning automount(8)\ndaemon) to the process group of the caller.\n\n\nAUTOFS_DEV_IOCTL_CATATONIC_CMD\n\nThis ioctl is issued against a specific autofs filesystem and therefore you must\nhave a valid ioctlfd to use before it can be issued.\n\nThis ioctl asks the kernel to mark the struct autofs_sb_info as no longer\nbeing responsive to mount requests. In addition, it closes the kernel’s side of\nthe pipe.\n\nIf an autofs filesystem is marked as cataonic, then there is no longer any\n“magic” going on. Requests are not dispatched to the user space daemon and all\nprocesses are allowed to see the raw filesystem (not just the process id that\nwas given as the pgrp mount option when the filesystem was mounted).\n\nMore details from\nautofs4-mount-control.txt\nstate that this command is a prerequisite for AUTOFS_DEV_IOCTL_SETPIPEFD_CMD:\n\n\n  In order to protect mounts against incorrectly setting the pipe descriptor we\nalso require that the autofs mount be catatonic (see next call).\n\n\nAnd a few examples of its use:\n\n1396617872001439657:  automount(30790)(0xffff8803d5ae0aa0) -&gt; autofs_dev_ioctl: AUTOFS_DEV_IOCTL_VERSION_CMD: major: 1, minor 0, size: 24, ioctlfd: -1\n1396617872001453752:  automount(30790)(0xffff8803d5ae0aa0) &lt;- autofs_dev_ioctl: AUTOFS_DEV_IOCTL_VERSION_CMD: major: 1, minor 0, size: 24, ioctlfd: -1\n1396617872001538014:  automount(30790)(0xffff8803d5ae0aa0) -&gt; mount: dev: automount, dir: /tmp/autoV2cNiF, type: autofs, data: fd=5,pgrp=30790,minproto=3,maxproto=5\n1396617872001570079:  automount(30790)(0xffff8803d5ae0aa0) -&gt; autofs4_fill_super: s=0xffff8803d8ba5c00 data=0xffff880210282000 silent=0x0\n1396617872001582788:  automount(30790)(0xffff8803d5ae0aa0) &lt;- autofs4_fill_super: {.magic=?, .pipefd=?, .pipe=?, .oz_pgrp=?, .catatonic=?, .version=?, .sub_version=?, .min_proto=?, .max_proto=?, .exp_timeout=?, .type=?, .reghost_enabled=?, .needs_reghost=?, .sb=?, .wq_mutex={...}, .pipe_mutex={...}, .fs_lock={...}, .queues=?, .lookup_lock={...}, .active_list={...}, .expiring_list={...}}\n1396617872001602201:  automount(30790)(0xffff8803d5ae0aa0) &lt;- mount\n1396617872001615443:  automount(30790)(0xffff8803d5ae0aa0) -&gt; autofs_dev_ioctl: AUTOFS_DEV_IOCTL_OPENMOUNT_CMD: major: 1, minor 0, size: 40, ioctlfd: -1: devid: 21, path: /tmp/autoV2cNiF\n1396617872001623807:  automount(30790)(0xffff8803d5ae0aa0) &lt;- autofs_dev_ioctl: AUTOFS_DEV_IOCTL_OPENMOUNT_CMD: major: 1, minor 0, size: 40, ioctlfd: 5: devid: 21, path: /tmp/autoV2cNiF\n1396617872001629141:  automount(30790)(0xffff8803d5ae0aa0) -&gt; autofs_dev_ioctl: AUTOFS_DEV_IOCTL_CATATONIC_CMD: major: 1, minor 0, size: 24, ioctlfd: 5 (/tmp/autoV2cNiF)\n1396617872001633477:  automount(30790)(0xffff8803d5ae0aa0) &lt;- autofs_dev_ioctl: AUTOFS_DEV_IOCTL_CATATONIC_CMD: major: 1, minor 0, size: 24, ioctlfd: 5 (/tmp/autoV2cNiF)\n1396617872001636885:  automount(30790)(0xffff8803d5ae0aa0) -&gt; autofs_dev_ioctl: AUTOFS_DEV_IOCTL_PROTOVER_CMD: major: 1, minor 0, size: 24, ioctlfd: 5: 0 (/tmp/autoV2cNiF)\n1396617872001648323:  automount(30790)(0xffff8803d5ae0aa0) &lt;- autofs_dev_ioctl: AUTOFS_DEV_IOCTL_PROTOVER_CMD: major: 1, minor 0, size: 24, ioctlfd: 5: 5 (/tmp/autoV2cNiF)\n1396617872001651494:  automount(30790)(0xffff8803d5ae0aa0) -&gt; autofs_dev_ioctl: AUTOFS_DEV_IOCTL_PROTOSUBVER_CMD: major: 1, minor 0, size: 24, ioctlfd: 5: 0 (/tmp/autoV2cNiF)\n1396617872001654267:  automount(30790)(0xffff8803d5ae0aa0) &lt;- autofs_dev_ioctl: AUTOFS_DEV_IOCTL_PROTOSUBVER_CMD: major: 1, minor 0, size: 24, ioctlfd: 5: 2 (/tmp/autoV2cNiF)\n1396617872001657498:  automount(30790)(0xffff8803d5ae0aa0) -&gt; autofs_dev_ioctl: AUTOFS_DEV_IOCTL_CLOSEMOUNT_CMD: major: 1, minor 0, size: 24, ioctlfd: 5 (/tmp/autoV2cNiF)\n1396617872001663579:  automount(30790)(0xffff8803d5ae0aa0) &lt;- autofs_dev_ioctl: AUTOFS_DEV_IOCTL_CLOSEMOUNT_CMD: major: 1, minor 0, size: 24, ioctlfd: 5 ()\n1396617872001668692:  automount(30790)(0xffff8803d5ae0aa0) -&gt; umount: name: /tmp/autoV2cNiF\n1396617872005398387:      mount(30794)(0xffff8803d82cd500) -&gt; mount: dev: /tmp/auto3HiGR8, dir: /tmp/auto8FUzqC, type: none, data: \n1396617872005425776:      mount(30794)(0xffff8803d82cd500) &lt;- mount\n1396617872006666370:     umount(30795)(0xffff8803d67cd500) -&gt; umount: name: /tmp/auto8FUzqC\n1396617872009924917:      mount(30797)(0xffff8803d18c4040) -&gt; mount: dev: /tmp/autoFHRl05, dir: /tmp/autowC47zz, type: none, data: \n1396617872009966875:      mount(30797)(0xffff8803d18c4040) &lt;- mount\n1396617872011350661:     umount(30798)(0xffff8803d269e040) -&gt; umount: name: /tmp/autowC47zz\n\n\n\nOR\n\n1396617872058378954:  automount(30790)(0xffff8803d67cd500) -&gt; autofs_dev_ioctl: AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD: major: 1, minor 0, size: 30, ioctlfd: -1: type: 1, path: /home\n1396617872058386150:  automount(30790)(0xffff8803d67cd500) &lt;- autofs_dev_ioctl: AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD: major: 1, minor 0, size: 30, ioctlfd: -1: devid: 22, magic: 0x187\n1396617872058391034:  automount(30790)(0xffff8803d67cd500) -&gt; autofs_dev_ioctl: AUTOFS_DEV_IOCTL_OPENMOUNT_CMD: major: 1, minor 0, size: 30, ioctlfd: -1: devid: 22, path: /home\n1396617872058396234:  automount(30790)(0xffff8803d67cd500) &lt;- autofs_dev_ioctl: AUTOFS_DEV_IOCTL_OPENMOUNT_CMD: major: 1, minor 0, size: 30, ioctlfd: 17: devid: 22, path: /home\n1396617872058407576:  automount(30790)(0xffff8803d67cd500) -&gt; autofs_dev_ioctl: AUTOFS_DEV_IOCTL_CATATONIC_CMD: major: 1, minor 0, size: 24, ioctlfd: 17 (/home)\n1396617872058410942:  automount(30790)(0xffff8803d67cd500) &lt;- autofs_dev_ioctl: AUTOFS_DEV_IOCTL_CATATONIC_CMD: major: 1, minor 0, size: 24, ioctlfd: 17 (/home)\n1396617872058414208:  automount(30790)(0xffff8803d67cd500) -&gt; autofs_dev_ioctl: AUTOFS_DEV_IOCTL_SETPIPEFD_CMD: major: 1, minor 0, size: 24, ioctlfd: 17: pipefd: 13 (/home)\n1396617872058417490:  automount(30790)(0xffff8803d67cd500) &lt;- autofs_dev_ioctl: AUTOFS_DEV_IOCTL_SETPIPEFD_CMD: major: 1, minor 0, size: 24, ioctlfd: 17: pipefd: 13 (/home)\n1396617872058420869:  automount(30790)(0xffff8803d67cd500) -&gt; autofs_dev_ioctl: AUTOFS_DEV_IOCTL_TIMEOUT_CMD: major: 1, minor 0, size: 24, ioctlfd: 17: timeout: 604800 (/home)\n1396617872058426251:  automount(30790)(0xffff8803d67cd500) &lt;- autofs_dev_ioctl: AUTOFS_DEV_IOCTL_TIMEOUT_CMD: major: 1, minor 0, size: 24, ioctlfd: 17: timeout: 604800 (/home)\n1396617872058436785:  automount(30790)(0xffff8803d67cd500) -&gt; autofs_dev_ioctl: AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD: major: 1, minor 0, size: 30, ioctlfd: 17: type: 0, path: /home\n1396617872058441090:  automount(30790)(0xffff8803d67cd500) &lt;- autofs_dev_ioctl: AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD: major: 1, minor 0, size: 30, ioctlfd: 17: devid: 22, magic: 0x0\n1396617872058471734:  automount(30790)(0xffff8803d67cd500) -&gt; autofs_dev_ioctl: AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD: major: 1, minor 0, size: 36, ioctlfd: -1: type: 0, path: /home/cperl\n1396617872058477028:  automount(30790)(0xffff8803d67cd500) &lt;- autofs_dev_ioctl: AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD: major: 1, minor 0, size: 36, ioctlfd: -1: devid: 27, magic: 0x6969\n1396617872058480975:  automount(30790)(0xffff8803d67cd500) -&gt; autofs_dev_ioctl: AUTOFS_DEV_IOCTL_REQUESTER_CMD: major: 1, minor 0, size: 36, ioctlfd: 17: uid: 0, gid: 0 (/home)\n1396617872058484928:  automount(30790)(0xffff8803d67cd500) &lt;- autofs_dev_ioctl: AUTOFS_DEV_IOCTL_REQUESTER_CMD: major: 1, minor 0, size: 36, ioctlfd: 17: uid: 0, gid: 0 (/home)\n\n\n\nAUTOFS_DEV_IOCTL_TIMEOUT_CMD\n\nThis ioctl is issued against a specific autofs filesystem and therefore you must\nhave a valid ioctlfd to use before it can be issued.\n\nThe parameter to the ioctl contains a struct args_timeout. This contains the\ntimeout for the autofs filesystem in seconds.\n\nInternally the kernel converts this to jiffies and stores it in the\nexp_timeout field of the associated struct autofs_sb_info.\n\n1396617872058420869:  automount(30790)(0xffff8803d67cd500) -&gt; autofs_dev_ioctl: AUTOFS_DEV_IOCTL_TIMEOUT_CMD: major: 1, minor 0, size: 24, ioctlfd: 17: timeout: 604800 (/home)\n1396617872058426251:  automount(30790)(0xffff8803d67cd500) &lt;- autofs_dev_ioctl: AUTOFS_DEV_IOCTL_TIMEOUT_CMD: major: 1, minor 0, size: 24, ioctlfd: 17: timeout: 604800 (/home)\n\n\n\nAUTOFS_DEV_IOCTL_REQUESTER_CMD\n\nThis ioctl is issued against a specific autofs filesystem and therefore you must\nhave a valid ioctlfd to use before it can be issued.\n\nThe parameter to the ioctl contains a struct args_requester. This struct is\ninitially all 0 on the call and output parameters are filled in before returning\nfrom the ioctl.\n\n1396530871075227821:  automount(29803)(0xffff8803d8aeb540) -&gt; autofs_dev_ioctl: AUTOFS_DEV_IOCTL_REQUESTER_CMD: major: 1, minor 0, size: 46, ioctlfd: 24: uid: 0, gid: 0 (/home/cperl)\n1396530871075232024:  automount(29803)(0xffff8803d8aeb540) &lt;- autofs_dev_ioctl: AUTOFS_DEV_IOCTL_REQUESTER_CMD: major: 1, minor 0, size: 46, ioctlfd: 24: uid: 12114, gid: 32771 (/home/cperl)\n\n\n\nThe rationale for this ioctl as quoted from\nautofs4-mount-control.txt:\n\n\n  In addition, to be able to reconstruct a mount tree that has busy mounts, the\nuid and gid of the last user that triggered the mount needs to be available\nbecause these can be used as macro substitution variables in autofs maps. They\nare recorded at mount request time and an operation has been added to retrieve\nthem.\n\n\nAUTOFS_DEV_IOCTL_EXPIRE_CMD\n\nThis ioctl is issued against a specific autofs filesystem and therefore you must\nhave a valid ioctlfd to use before it can be issued.\n\nThe parameter to the ioctl contains a struct args_expire which is a bit flag\nin which two flags can be set:\n\n/* Mask for expire behaviour */\n#define AUTOFS_EXP_IMMEDIATE    1\n#define AUTOFS_EXP_LEAVES       2\n\n\n\nIn practice, I’ve seen this ioctl called with no flags set, or just\nAUTOFS_EXP_IMMEDIATE set. Its possible that it will use AUTOFS_EXP_LEAVES\nunder some circumstances, but none that I’ve hit yet.\n\n1396470011128585542:  automount(24018)(0xffff8803d49f8aa0) -&gt; autofs_dev_ioctl: AUTOFS_DEV_IOCTL_EXPIRE_CMD: major: 1, minor 0, size: 24, ioctlfd: 34: how: AUTOFS_EXP_IMMEDIATE (/a/b/c)\n1396470011128619772:  automount(24018)(0xffff8803d49f8aa0) &lt;- autofs_dev_ioctl: AUTOFS_DEV_IOCTL_EXPIRE_CMD: major: 1, minor 0, size: 24, ioctlfd: 34: how: AUTOFS_EXP_IMMEDIATE (/a/b/c)\n\n\n\nOR\n\n1396453375060672511:  automount(18388)(0xffff8803d6cd1540) -&gt; autofs_dev_ioctl: AUTOFS_DEV_IOCTL_EXPIRE_CMD: major: 1, minor 0, size: 24, ioctlfd: 34: how:  (/a/b/c)\n1396453375060687029:  automount(18388)(0xffff8803d6cd1540) &lt;- autofs_dev_ioctl: AUTOFS_DEV_IOCTL_EXPIRE_CMD: major: 1, minor 0, size: 24, ioctlfd: 34: how:  (/a/b/c)\n\n\n\nFurther details from\nautofs4-mount-control.txt:\n\n\n  Issue an expire request to the kernel for an autofs mount. Typically this\nioctl is called until no further expire candidates are found. The call\nrequires an initialized struct autofs_dev_ioctl with the ioctlfd field set\nto the descriptor obtained from the open call. In addition an immediate\nexpire, independent of the mount timeout, can be requested by setting the arg1\nfield to 1. If no expire candidates can be found the ioctl returns -1 with\nerrno set to EAGAIN. This call causes the kernel module to check the mount\ncorresponding to the given ioctlfd for mounts that can be expired, issues an\nexpire request back to the daemon and waits for completion.\n\n\nAUTOFS_DEV_IOCTL_ASKUMOUNT_CMD\n\nThis ioctl is issued against a specific autofs filesystem and therefore you must\nhave a valid ioctlfd to use before it can be issued.\n\nThe parameter to the ioctl contains a struct args_askumount. That number is\nset to 0 by the user space daemon and is either set 1 to indicate the mount\npoint may be unmounted, or left at zero to indicate that the mount point is\nstill busy.\n\n1396453065009385966:  automount(18388)(0xffff8803d64b4aa0) -&gt; autofs_dev_ioctl: AUTOFS_DEV_IOCTL_ASKUMOUNT_CMD: major: 1, minor 0, size: 24, ioctlfd: 34: may_umount: 0 (/foo)\n1396453065009390765:  automount(18388)(0xffff8803d64b4aa0) &lt;- autofs_dev_ioctl: AUTOFS_DEV_IOCTL_ASKUMOUNT_CMD: major: 1, minor 0, size: 24, ioctlfd: 34: may_umount: 0 (/foo)\n\n\n\nOR\n\n1396454296928731407:  automount(18388)(0xffff8803d5ae0040) -&gt; autofs_dev_ioctl: AUTOFS_DEV_IOCTL_ASKUMOUNT_CMD: major: 1, minor 0, size: 24, ioctlfd: 34: may_umount: 0 (/a/b/c)\n1396454296928738658:  automount(18388)(0xffff8803d5ae0040) &lt;- autofs_dev_ioctl: AUTOFS_DEV_IOCTL_ASKUMOUNT_CMD: major: 1, minor 0, size: 24, ioctlfd: 34: may_umount: 1 (/a/b/c)\n\n\n\nAUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD\n\nThis ioctl can be issued with or without an ioctlfd. If anioctlfd is passed,\nthat is used to find the mount point to operate on, otherwise you must set\nioctlfd to -1 and pass a string in the argument.\n\nThe parameter to the ioctl contains a struct args_ismountpoint. This struct is\na union that defines different arguments for the way in and for the way out.\n\nOn the way in, it contains a string path to inquire about. On return it fills\nout the devid and the magic number. Using the magic number, the user space\ndaemon can determine what type of filesystem is mounted at that path.\n\nFor example the following are defined in include/linux/magic.h:\n\n#define AUTOFS_SUPER_MAGIC    0x0187\n#define NFS_SUPER_MAGIC       0x6969\n\n\n\nAnd you can see examples of the returned values from the calls. The user space\ndaemon uses this information to determine what action it might need to take.\n\nBelow are three examples of what it might return (autofs, nfs, or nothing). The\nnothing case would correspond to a directory that had been created beneath an\nindirect autofs mount point for ghosting:\n\n1396454169249848027:  automount(18388)(0xffff8803d2643500) -&gt; autofs_dev_ioctl: AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD: major: 1, minor 0, size: 44, ioctlfd: -1: type: 0, path: /home/cperl\n1396454169249853221:  automount(18388)(0xffff8803d2643500) &lt;- autofs_dev_ioctl: AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD: major: 1, minor 0, size: 44, ioctlfd: -1: devid: 200, magic: 0x187\n\n1396454169249858444:  automount(18388)(0xffff8803d2643500) -&gt; autofs_dev_ioctl: AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD: major: 1, minor 0, size: 44, ioctlfd: -1: type: 0, path: /foo\n1396454169249863856:  automount(18388)(0xffff8803d2643500) &lt;- autofs_dev_ioctl: AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD: major: 1, minor 0, size: 44, ioctlfd: -1: devid: 38, magic: 0x6969\n\n1396454169257121837:  automount(18388)(0xffff8803d11db500) -&gt; autofs_dev_ioctl: AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD: major: 1, minor 0, size: 47, ioctlfd: -1: type: 0, path: /home/dlobraico\n1396454169257128955:  automount(18388)(0xffff8803d11db500) &lt;- autofs_dev_ioctl: AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD: major: 1, minor 0, size: 47, ioctlfd: -1: devid: 25, magic: 0x0\n\n\n\nRelevant quotes from\nautofs4-mount-control.txt:\n\n\n  Since we’re re-implementing the control interface, a couple of other problems\nwith the existing interface have been addressed. First, when a mount or expire\noperation completes a status is returned to the kernel by either a “send\nready” or a “send fail” operation. The “send fail” operation of the ioctl\ninterface could only ever send ENOENT so the re-implementation allows user\nspace to send an actual status. Another expensive operation in user space, for\nthose using very large maps, is discovering if a mount is present. Usually\nthis involves scanning /proc/mounts and since it needs to be done quite often\nit can introduce significant overhead when there are many entries in the mount\ntable. An operation to lookup the mount status of a mount point dentry\n(covered or not) has also been added.\n\n  The call requires an initialized struct autofs_dev_ioctl. There are two\npossible variations. Both use the path field set to the path of the mount\npoint to check and the size field adjusted appropriately. One uses the ioctlfd\nfield to identify a specific mount point to check while the other variation\nuses the path and optionally arg1 set to an autofs mount type. The call\nreturns 1 if this is a mount point and sets arg1 to the device number of the\nmount and field arg2 to the relevant super block magic number (described\nbelow) or 0 if it isn’t a mountpoint. In both cases the the device number (as\nreturned by new_encode_dev()) is returned in field arg1.\n\n\nSystemTap Script\n\nBelow is the SystemTap script that came out of debugging/tracing automount(8).\nThe script started off very small and targeted to see what was happening inside\nthe kernel, but continually grew over time.\n\nEach time I answered one question about how automount worked I would uncover\nseveral more. This script should not be considered a general purpose tool as it\nis placing specific user space probes at specific lines in the automount source\nand therefore would certainly need to be adjusted if used in the future (i.e.\nline numbers will certainly change).\n\nOne particular limitation of SystemTap (or perhaps user space spacing probing, I\nhaven’t looked that hard) that I came across while investigating this is that\nthe SystemTap script that places user space probes must be setup and running\nbefore the process that those probes are meant trace.\n\nIf you start the process first (or leave the process running), you won’t see any\nof the user space stuff, which was ultimately quite important to understanding\nautomount(8)’s behavior and finding the bug in its handling of negative cache\nentries.\n\n/*\n  * !!! DO NOT USE THIS ON A PRODUCTION MACHINE !!!\n  *\n  * Run this with:\n  * stap -vv --vp 10101 \\\n  *   -d /usr/sbin/automount \\\n  *   -d /usr/lib64/autofs/lookup_file.so \\\n  *   -d /usr/lib64/autofs/parse_sun.so \\\n  *   -d /usr/lib64/autofs/mount_nfs.so \\\n  *   --ldd \n  *   -D MAXSTRINGLEN=1024 \\\n  *   -g \\\n  *   ./autofs.stp\n  */\n\n/* for selective tracing */\nglobal trace;\n\n/* for constant to string translation */\nglobal _autofs_kernel_notify_types[4]\nglobal _autofs_nsswitch_return_status[5]\n\nprobe begin {\n    /** Autofs kernel to user space notify types **/\n    _autofs_kernel_notify_types[0x3] = \"autofs_ptype_missing_indirect\";\n    _autofs_kernel_notify_types[0x4] = \"autofs_ptype_expire_indirect\";\n    _autofs_kernel_notify_types[0x5] = \"autofs_ptype_missing_direct\";\n    _autofs_kernel_notify_types[0x6] = \"autofs_ptype_expire_direct\";\n\n    /** Autofs nss return codes from include/nsswitch.h **/\n    _autofs_nsswitch_return_status[0x0] = \"NSS_STATUS_SUCCESS\";\n    _autofs_nsswitch_return_status[0x1] = \"NSS_STATUS_NOTFOUND\";\n    _autofs_nsswitch_return_status[0x2] = \"NSS_STATUS_UNAVAIL\";\n    _autofs_nsswitch_return_status[0x4] = \"NSS_STATUS_TRYAGAIN\";\n    _autofs_nsswitch_return_status[0x5] = \"NSS_STATUS_MAX\";\n\n}\n\n/* Stolen from /usr/share/doc/systemtap-client-1.8/examples/process/pfiles.stp */\nfunction task_file_handle_d_path:string (task:long, fd:long) %{ /* pure */\n    struct task_struct *p = (struct task_struct *)((long)STAP_ARG_task);\n    struct files_struct *files;\n    char *page = NULL;\n    struct file *filp;\n    struct dentry *dentry;\n    struct vfsmount *vfsmnt;\n    char *path = NULL;\n\n    rcu_read_lock();\n    if ((files = kread(&p-&gt;files)) &&\n        // We need GFP_ATOMIC since we're inside a lock so we\n        // can't sleep.\n        (page = (char *)__get_free_page(GFP_ATOMIC)) &&\n        (filp = fcheck_files(files, STAP_ARG_fd))) {\n\n#if LINUX_VERSION_CODE &gt;= KERNEL_VERSION(2,6,26)\n        /* git commit 9d1bc601 */\n        path = d_path(&filp-&gt;f_path, page, PAGE_SIZE);\n#else\n        dentry = kread(&filp-&gt;f_dentry);\n        vfsmnt = kread(&filp-&gt;f_vfsmnt);\n\n        if (dentry && vfsmnt) {\n            path = d_path(dentry, vfsmnt, page, PAGE_SIZE);\n        }\n#endif\n        if (path && !IS_ERR(path)) {\n            snprintf(STAP_RETVALUE, MAXSTRINGLEN, \"%s\", path);\n        }\n    }\n    CATCH_DEREF_FAULT();\n\n    if (page) free_page((unsigned long)page);\n\n    rcu_read_unlock();\n%}\n\nfunction log_common:string () {\n    return sprintf(\"%d: %10s(%5d)(0x%x)\", gettimeofday_ns(), execname(), pid(), task_current());\n}\n\nfunction autofs_dev_ioctl2str:string (cmd:long, pkt:long, outgoing:long) {\n    major   = user_uint32(pkt);\n    minor   = user_uint32(pkt+4);\n    size    = user_uint32(pkt+8);\n    ioctlfd = user_int32(pkt+12);\n\n    task = task_current();\n    common = sprintf(\"major: %d, minor %d, size: %d, ioctlfd: %d\", major, minor, size, ioctlfd);\n\n    cmd = cmd & 255;\n    if (cmd == 0x71) {\n        s = \"AUTOFS_DEV_IOCTL_VERSION_CMD\";\n        return sprintf(\"%s: %s\", s, common);\n    }\n\n    else if (cmd == 0x72) {\n        s = \"AUTOFS_DEV_IOCTL_PROTOVER_CMD\";\n        v = user_uint32(pkt+16);\n        return sprintf(\"%s: %s: %d\", s, common, v);\n    }\n\n    else if (cmd == 0x73) {\n        s = \"AUTOFS_DEV_IOCTL_PROTOSUBVER_CMD\";\n        v = user_uint32(pkt+16);\n        return sprintf(\"%s: %s: %d\", s, common, v);\n    }\n\n    else if (cmd == 0x74) {\n        s = \"AUTOFS_DEV_IOCTL_OPENMOUNT_CMD\";\n        sz = size - 24;\n        devid = user_uint32(pkt+16);\n        path  = user_string_n(pkt+24, sz);\n        return sprintf(\"%s: %s: devid: %d, path: %s\", s, common, devid, path);\n    }\n\n    else if (cmd == 0x75) {\n        s = \"AUTOFS_DEV_IOCTL_CLOSEMOUNT_CMD\"; \n        path = task_file_handle_d_path(task, ioctlfd);\n        return sprintf(\"%s: %s (%s)\", s, common, path);\n    }\n\n    else if (cmd == 0x76) {\n        s = \"AUTOFS_DEV_IOCTL_READY_CMD\";\n        token = user_uint32(pkt+16);\n        path = task_file_handle_d_path(task, ioctlfd);\n        return sprintf(\"%s: %s: token: %d (%s)\", s, common, token, path);\n    }\n\n    else if (cmd == 0x77) {\n        s = \"AUTOFS_DEV_IOCTL_FAIL_CMD\";\n        token = user_uint32(pkt+16);\n        status = user_int32(pkt+20);\n        path = task_file_handle_d_path(task, ioctlfd);\n        return sprintf(\"%s: %s: token: %d, status: %d (%s)\", s, common, token, status, path);\n    }\n\n    else if (cmd == 0x78) {\n        s = \"AUTOFS_DEV_IOCTL_SETPIPEFD_CMD\";\n        pipefd = user_int32(pkt+16);\n        path = task_file_handle_d_path(task, ioctlfd);\n        return sprintf(\"%s: %s: pipefd: %d (%s)\", s, common, pipefd, path);\n    }\n\n    else if (cmd == 0x79) {\n        s = \"AUTOFS_DEV_IOCTL_CATATONIC_CMD\";\n        path = task_file_handle_d_path(task, ioctlfd);\n        return sprintf(\"%s: %s (%s)\", s, common, path);\n    }\n\n    else if (cmd == 0x7A) {\n        s = \"AUTOFS_DEV_IOCTL_TIMEOUT_CMD\";\n        timeout = user_uint64(pkt+16);\n        path = task_file_handle_d_path(task, ioctlfd);\n        return sprintf(\"%s: %s: timeout: %d (%s)\", s, common, timeout, path);\n    }\n\n    else if (cmd == 0x7B) {\n        s = \"AUTOFS_DEV_IOCTL_REQUESTER_CMD\";\n        uid = user_uint32(pkt+16);\n        gid = user_uint32(pkt+20);\n        path = task_file_handle_d_path(task, ioctlfd);\n        return sprintf(\"%s: %s: uid: %d, gid: %d (%s)\", s, common, uid, gid, path);\n    }\n\n    else if (cmd == 0x7C) {\n        s = \"AUTOFS_DEV_IOCTL_EXPIRE_CMD\";\n        how = user_uint32(pkt+16);\n        path = task_file_handle_d_path(task, ioctlfd);\n        h = \"\";\n\n        if (how & 1) {\n            h = \"AUTOFS_EXP_IMMEDIATE\";\n        }\n\n        if (how & 2) {\n            t = \"AUTOFS_EXP_LEAVES\";\n            h = h == \"\" ? t : sprintf(\"%s|%s\", h, t);\n        }\n\n        return sprintf(\"%s: %s: how: %s (%s)\", s, common, h, path);\n    }\n\n    else if (cmd == 0x7D) {\n        s = \"AUTOFS_DEV_IOCTL_ASKUMOUNT_CMD\";\n        may_umount = user_uint32(pkt+16);\n        path = task_file_handle_d_path(task, ioctlfd);\n        return sprintf(\"%s: %s: may_umount: %d (%s)\", s, common, may_umount, path);\n    }\n\n    /* ISMOUNTPIONT is interpretted different on the way in than out */\n    else if (cmd == 0x7E) {\n        s = \"AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD\";\n        if (!outgoing) {\n            sz = size - 24;\n            type = user_uint32(pkt+16);\n            path = user_string_n(pkt+24, sz);\n            return sprintf(\"%s: %s: type: %d, path: %s\", s, common, type, path);\n        }\n        else {\n            devid = user_uint32(pkt+16);\n            magic = user_uint32(pkt+20);\n            return sprintf(\"%s: %s: devid: %d, magic: 0x%X\", s, common, devid, magic);\n        }\n    }\n\n    else {\n        return \"UNKNOWN\"\n    }\n}\n\nfunction autofs_kernel_notify2str:string (typ:long) {\n    return typ in _autofs_kernel_notify_types ? _autofs_kernel_notify_types[typ] : \"UNKNOWN\"\n}\n\nfunction nsswitch_status2str:string (typ:long) {\n    return typ in _autofs_nsswitch_return_status ?  _autofs_nsswitch_return_status[typ] : \"NSS_STATUS_UNKNOWN\"\n}\n\nprobe module(\"autofs4\").function(\"autofs_dev_ioctl\").call\n{\n    if (execname() == \"automount\") {\n        printf(\"%s -&gt; %s: %s\\n\",\n            log_common(),\n            \"autofs_dev_ioctl\",\n            autofs_dev_ioctl2str($command, $u, 0));\n    }\n}\n\nprobe module(\"autofs4\").function(\"autofs_dev_ioctl\").return\n{\n    if (execname() == \"automount\") {\n        printf(\"%s  %s: dev: %s, dir: %s, type: %s, data: %s\\n\",\n            log_common(), \"mount\", dev, dir, typ, dat);\n}\n\nprobe kernel.function(\"sys_mount\").return\n{\n        printf(\"%s  %s: name: %s\\n\", log_common(), \"umount\", name);\n}\n\nprobe module(\"autofs4\").function(\"autofs4_fill_super\").call\n{\n    if (execname() == \"automount\") {\n        printf(\"%s -&gt; %s: %s\\n\", log_common(), probefunc(), $$parms);\n    }\n}\n\nprobe module(\"autofs4\").function(\"autofs4_fill_super\").return\n{\n    if (execname() == \"automount\") {\n        printf(\"%s  %s: %s: pipefd: %d, proto: %d: %s\\n\",\n        log_common(),\n        probefunc(),\n        autofs_kernel_notify2str($pkt-&gt;v5_pkt-&gt;hdr-&gt;type),\n        $sbi-&gt;pipefd,\n        $pkt-&gt;v5_pkt-&gt;hdr-&gt;proto_version,\n        $pkt-&gt;v5_pkt-&gt;v5_packet$$);\n}\n\n/* user space probes, requires the debuginfo package for the version of autofs\n  * be installed, and for the backtraces requires debuginfo for glibc */\nprobe process(\"/usr/sbin/automount\").function(\"handle_packet_missing_indirect\").call {\n    printf(\"%s -&gt; %s: AP: %s: PACKET: %s\\n\", log_common(), probefunc(), $ap$$, $pkt$$);\n}\n\nprobe process(\"/usr/sbin/automount\").function(\"handle_packet_missing_indirect\").return {\n    printf(\"%s  %s: %s\\n\", log_common(), probefunc(), path);\n    print_ubacktrace();\n}\n\nprobe process(\"/usr/sbin/automount\").function(\"mkdir_path\").return {\n    printf(\"%s  %s: %s\\n\", log_common(), \"st_add_task\", \"ST_READMAP\");\n        print_ubacktrace();\n    }\n}\n\nprobe process(\"/usr/sbin/automount\").function(\"st_add_task\").return {\n    t = task_current();\n    if (trace[t]) {\n        delete trace[t];\n        printf(\"%s  %s: %s\\n\", log_common(), probefunc(), $$parms$$);\n}\n\nprobe process(\"/usr/lib64/autofs/lookup_file.so\").function(\"lookup_mount\").return {\n    printf(\"%s  %s: %s\\n\", log_common(), \"lookup_ghost\", $me$$);\n}\n\nprobe  process(\"/usr/sbin/automount\").function(\"cache_update\").call,\n        process(\"/usr/sbin/automount\").function(\"cache_add\").call {\n    key = user_string($key);\n    t = task_current();\n    trace[t] = 1;\n    printf(\"%s -&gt; %s: (KEY: %s): %s\\n\", log_common(), probefunc(), key, $$parms$$);\n    print_ubacktrace();\n}\n\nprobe  process(\"/usr/sbin/automount\").function(\"cache_update\").return,\n    process(\"/usr/sbin/automount\").function(\"cache_add\").return {\n    t = task_current();\n    if (trace[t]) {\n        delete trace[t];\n        printf(\"%s  %s: %s\\n\", log_common(), \"lookup_prune_cache\", $$parms$$);\n}\n\nprobe process(\"/usr/sbin/automount\").function(\"lookup_prune_cache\").return {\n    printf(\"%s  %s: %s\\n\", log_common(), \"lookup_prune_one_cache\", $me$$);\n}\n\nprobe process(\"/usr/sbin/automount\").statement(\"st_readmap@daemon/state.c:587\") {\n    printf(\"%s  -&gt; %s: now: %d\\n\", log_common(), \"st_readmap (starting do_readmap thread)\", $ra-&gt;now);\n}\n\n",
        "url"      : "https://blog.janestreet.com/how-does-automount-work-anyway/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Generic mapping and folding in OCaml",
        "date"     : "April 7, 2014",
        "authorId" : "moconnor",
        "author"   : "Michael O'Connor",
        "tags"     : [],
        "minsToRead" : 4,
        "content"  : "Haskell has a function fmap which can map over a number of different\ndatatypes. For example, fmap can map a function over both a List and a\nMaybe (the equivalent of an option in OCaml):\n\nPrelude&gt; fmap (+ 1) [1,2]\n[2,3]\nPrelude&gt; fmap (+ 1) (Just 3)\nJust 4\n\n\n\nUnfortunately, the equivalent is impossible in OCaml. That is, there’s no way to\ndefine an OCaml value fmap so that the two expressions:\n\n# fmap [1;2]    ~f:((+) 1)\n# fmap (Some 3) ~f:((+) 1)\n\n\n\nboth typecheck and evaluate to the right value.\n\nEven if we eliminate the complexity of type inference by specifying the type\nexplicitly, we can’t define fmap so that the two expressions:\n\n# fmap ([1;2]  : _ list)   ~f:((+) 1)\n# fmap (Some 3 : _ option) ~f:((+) 1)\n\n\n\ntypecheck and evaluate to the right value.\n\nHowever, the Generic module in Jane Street’s Core_extended library will let\nus do exactly that with just a trivial syntactic change. But before continuing,\nI’ll warn you that the Generic module is not necessarily something you’d want\nto use in real world code; it falls much more in the “cute trick” category. But\nwith that caveat, let’s look at our example using Generic:\n\n# open Core.Std;;\n# open Core_extended.Generic;;\n\n# map ([1;2] &gt;: __ list) ~f:((+) 1);;\n- : int list = [2; 3]\n# map (Some 3 &gt;: __ option) ~f:((+) 1);;\n- : int option = Some 4    \n\n\n\nNote that, after opening the Generic module, all we did to the previous\nexample was change : to &gt;: and _ to __. (Also, the Generic module\ncalls the mapping function map instead of fmap, but that’s inconsequential.)\n\nOf course, the trick is that &gt;:, __, list, and option are actually\nvalues defined by the Generic module in such a way that their intended usage\nlooks like a type annotation.\n\nNote that these “types” are nestable as you would expect real types to be:\n\n# map ([None; Some 3] &gt;: __ option list) ~f:((+) 1);;\n- : int option list = [None; Some 4]        \n\n\n\nThis means that you can change what map does just by changing the “type” you\nassign to its argument:\n\n# map ([None; Some 3] &gt;: __ option list) ~f:(fun _ -&gt; ());;\n- : unit option list = [None; Some ()]\n# map ([None; Some 3] &gt;: __ list) ~f:(fun _ -&gt; ());;\n- : unit list = [(); ()]\n\n\n\nThe Generic module also defines a generic fold function so that you can\naccumulate values at any “depth” in your value:\n\n# fold ([[Some 3; None]; [Some 5; Some 2]] &gt;: __ option list list) ~init:0 ~f:(+);;\n- : int = 10\n\n\n\nNot every “type” formable is __ followed by some sequence of options and\nlists: for example, Generic also provides string (considered as a\ncontainer of characters):\n\n# map ([Some \"foo\"; None; Some \"bar\"] &gt;: string option list) ~f:Char.uppercase;;\n- : string option list = [Some \"FOO\"; None; Some \"BAR\"]\n\n\n\nNote that the fact that the “types” are nestable means that these values must\nhave unusual definitions: in particular, __ (and string) are functions which\nmust be able to take a variable number of arguments. Indeed, these values are\ndefined using a technique sweeks wrote about in\na blog post on variable argument functions: the\nf and z in sweeks’s post are analogous here to __ and &gt;: respectively.\n\nHere’s the definition of the primitive values we’ve used so far (Generic\nactually defines a few more):\n\nlet __ k = k (fun f x -&gt; f x)\n\nlet ( &gt;: ) x t y = t (fun x -&gt; x) y x\n\nlet map x ~f = x f\n\nlet string k = k (fun f -&gt; String.map ~f)\n\nlet list map k = k (fun f -&gt; List.map ~f:(map f))\n\nlet option map k = k (fun f -&gt; Option.map ~f:(map f))\n\n\n\nThe types of these turn out to be extremely unhelpful, and you can’t really use\nthem to figure out how to use these values. For example, here is the type of\n&gt;: (and this isn’t just the inferred type of the above definition, this is the\ntype which must actually be exposed to use &gt;:):\n\nval ( &gt;: ) : 'a -&gt; (('b -&gt; 'b) -&gt; 'c -&gt; 'a -&gt; 'd) -&gt; 'c -&gt; 'd\n\n\n\nFinally, is this module actually used? The answer is no. As far as I know, it’s\nused nowhere in Jane Street’s codebase. But it’s still kind of cute.\n",
        "url"      : "https://blog.janestreet.com/generic-mapping-and-folding-in-ocaml/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Breaking down FRP",
        "date"     : "February 24, 2014",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 11,
        "content"  : "As anyone who has looked into functional reactive\nprogramming (FRP)\nknows, there are lots of competing approaches to it, and not a lot of conceptual\nclarity about how they relate to each other. In this post, I’ll try to shed some\nlight, and in particular give you some guide posts for understanding how the\ndifferent versions of FRP relate to each other. Plus, I’ll show some connections\nto a similar technique called self-adjusting computation (SAC).\n\nThe analysis here should mostly be credited to Evan Czaplicki, who gave a talk\nat Jane Street a week ago. Any confusions and mistakes are, of course, my own.\nAlso, thanks to Jake McArthur for filling me in on a few key details.\n\nIn all of this I’m basically going to talk only about discrete FRP. A lot of the\npeople involved in FRP think of continuous-time semantics as really important,\nbut that’s not the focus here. (I am curious, however, if anyone can explain why\npeople think continuous time semantics are important when writing programs for a\nlargely discrete computer.)\n\nFirst, some basics. Roughly speaking, FRP systems are meant to make it easier to\nwrite programs that react to external events by providing a natural way to\nprogram with and reason about time-varying signals.\n\nNow, time to oversimplify.\n\nAn FRP program effectively ties signals together in a dependency graph, where\neach signal is either an external input or a derived signal that feeds off of\nother signals that have already been defined. A key aspect of a system like this\nis that the language runtime uses the dependency graph to minimize the amount of\nwork that needs to be done in response to an external event.\n\nHere are some properties that you might want from your FRP system:\n\n\n  History-sensitivity, or the ability to construct calculations that react\nnot just to the current state of the world, but also to what has happened in\nthe past.\n  Efficiency. This comes in two forms: space efficiency, mostly meaning\nthat you want to minimize the amount of your past that you need to remember;\nand computational efficiency, meaning that you want to minimize the amount\nof the computation that must be rerun when inputs change.\n  Dynamism, or the ability to reconfigure the computation over time, as\nthe inputs to your system change.\n  Ease of reasoning. You’d like the resulting system to have a clean\nsemantics that’s easy to reason about.\n\n\nIt turns out you can’t have all of these at the same time, and you can roughly\ncategorize different approaches to FRP by which subset they aim to get. Let’s\nwalk through them one by one.\n\nPure Monadic FRP\n\nThis approach gives you dynamism, history-sensitivity and ease of reasoning, but\nhas unacceptable space efficiency.\n\nAs you might expect, the signal combinators in pure monadic FRP can be described\nas a monad. That means we have access to the usual monadic operators, the\nsimplest of which is return, which creates a constant signal.\n\nval return : 'a -&gt; 'a signal\n\n\n\nYou also have map, which lets you transform a signal by applying a function to\nit at every point in time.\n\nval map: 'a signal -&gt; ('a -&gt; 'b) -&gt; 'b signal\n\n\n\nOperators like map2 let you take multiple signals and combine them together,\nagain by applying a function to the input signals at every point in time to\nproduce the output signal.\n\nval map2 : 'a signal -&gt; 'b signal -&gt; ('a -&gt; 'b -&gt; 'c) -&gt; 'c signal\n\n\n\nNote that all of the above essentially correspond to building up a static set of\ndependencies between signals. To finish our monadic interface and to add\ndynamism, we need one more operator, called join.\n\nval join: 'a signal signal -&gt; 'a signal\n\n\n\nNested signals are tricky. In a nested signal, you can think of the outer signal\nas choosing between different inner signals. When these are collapsed with\njoin, you essentially get a signal that can change its definition, and\ntherefore its dependencies, in response to changing inputs.\n\nWe’re still missing history sensitivity. All of the operators thus far work on\ncontemporaneous values. We don’t have any operators that let us use information\nfrom the past. foldp is an operator that does just that, by folding forward\nfrom the past.\n\nval foldp: 'a signal -&gt; init:'acc -&gt; ('acc -&gt; 'a -&gt; 'acc) -&gt; 'acc signal\n\n\n\nWith foldp, history is at our disposal. For example, we can write a function\nthat takes a signal containing an x/y position, and returns the largest distance\nthat position has ever been from the origin. Here’s the code.\n\nlet max_dist_to_origin (pos : (float * float) signal) : float signal =\n  foldp pos ~init:0. ~f:(fun max_so_far (x,y) -&gt;\n    Float.(max max_so_far (sqrt (x * x + y * y))))\n\n\n\nHere, max_so_far acts as a kind of state variable that efficiently summarizes\nthe necessary information about the past.\n\nfoldp seems like it should be implementable efficiently, but we run into\ntrouble when we try to combine history sensitivity and dynamism. In particular,\nconsider what happens when you try to compute max_dist_to_origin on the signal\nrepresenting the position of the mouse. And in particular, what if we only\ndecide to run this computation at some point in the middle of the execution of\nour program? We then have two choices: either\n(max_dist_to_origin mouse_pos) always has the same meaning, or, its\nmeaning depends on when it was called.\n\nIn pure monadic FRP, we make the choice to always give such an expression the\nsame meaning, and thus preserve equational reasoning. We also end up with\nsomething that’s impossible to implement efficiently. In particular, this choice\nforces us to remember every value generated by every input forever.\n\nThere are various ways out of this performance trap. In the following, I’ll\ndescribe the different escape paths chosen by different styles of FRP systems.\n\nPure Applicative FRP\n\nThe idea behind applicative FRP is simple enough: just drop the join operator,\nthus giving up dynamism. This means that you end up constructing static\ndependency graphs. Without the ability to reconfigure, you don’t run into the\nquestion of what happens when an expression like\n(max_dist_to_origin mouse_pos) is evaluated at multiple points in time.\n\nThis is the approach that Elm takes, and seems like the primary approach that is\ntaken by practical systems concerned with describing UI interactions.\n\nThere’s a variant on pure applicative FRP called, confusingly, Arrowized FRP.\nEffectively, Arrowized FRP lets you create a finite collection of static graphs\nwhich you can switch between. If those static graphs contain history-dependent\ncomputations, then all the graphs will have to be kept running at all times,\nwhich means that, while it can be more efficient then applicative FRP, it’s not\nmaterially more expressive.\n\nImpure Monadic FRP\n\nImpure monadic FRP basically gives up on equational reasoning. In other words,\nthe meaning of (max_dist_to_origin mouse_pos) depends on when you call it.\nEssentially, evaluating an expression that computes a history-sensitive signal\nshould be thought of as an effect in that it returns different results depending\non when you evaluate it.\n\nLoosing equational reasoning is not necessarily the end of the world, but my\nexperience programming in this style makes me think that it really is\nproblematic. In particular, reasoning about when a computation was called in a\ndynamic dependency graph is really quite tricky and non-local, which can lead to\nprograms whose semantics is difficult to predict.\n\nSelf-Adjusting Computations\n\nSelf-adjusting computations are what you get when you give up on\nhistory-sensitivity. In particular, SAC has no foldp operator. The full set of\nmonadic operators, however, including join are in place, which means that you\ndo have dynamism. This dynamism is quite valuable, it turns out. Among other\nthings, it allows you to build a highly configurable computation that can\nrespond to reconfiguration efficiently.\n\nAs you might expect, the lack of history-sensitivity makes SAC less suitable for\nwriting user interfaces. Indeed, SAC was never intended for such applications;\nits original purpose was for building efficient on-line algorithms, i.e.,\nalgorithms that could be updated efficiently when the problem changes in a small\nway.\n\nSAC is also easy to reason about, in that all an SAC computation is doing is\nincrementalizing an otherwise ordinary functional program. You get full\nequational reasoning as long as you avoid effects within your SAC computation.\n\nHistory sensitivity for SAC\n\nAt Jane Street, we have our own SAC library called Incremental, which is used\nfor a variety of different applications. In practice, however, a lot of our SAC\napplications do require some amount of history-sensitivity. The simplest and\neasiest approach to dealing with history within SAC is to create inputs that\nkeep track of whatever history is important to your application. Then, the SAC\ncomputation can use that input without any complications.\n\nThus, if you there’s an input whose minimum and maximum value you want to be\nable to depend on in your calculation, you simply set up calculations outside of\nthe system that create new inputs that inject that historical information into\nyour computation.\n\nYou can keep track of history in a more ad-hoc way by doing side-effects within\nthe functions that are used to compute a given node. Thus, we could write a node\nthat computes the maximum value of a given input using a reference, as follows.\n\nlet max_of_signal i =\n  let max = ref None in\n    map i (fun x -&gt;\n      match !max with\n      | None -&gt; max := Some x; x\n      | Some y -&gt; \n        let new_max = Float.max x y in\n        max := new_max;\n        new_max)\n\n\n\nBut this kind of trick is dangerous, particularly because of the optimizations\nthat are implemented in Incremental and other SAC implementations. In\nparticular, Incremental tries to avoid computing nodes whose values are not\npresently in use, and as such, signals that are deemed unnecessary are not kept\nup-to-date. Thus, if you create a signal by calling (max_of_signal s), and\nthen keep it around but don’t hook it into your final output, the computation\nwill stop running and will thus stop receiving updates. Then, if you pull it\nback into your computation, it will have a value that reflects only part of the\ntrue history.\n\nThere are some tricks for dealing with this in Incremental. In particular, we\nhave an operator called necessary_if_alive, which forces the node in question\nto remain alive even if it’s not necessary at the moment. That helps, but there\nare still complicated corner cases. Our preferred approach to dealing with such\ncases is to statically declare the set of history-sensitive signals, and make\nsure that those are alive and necessary at all times.\n\nBroader lessons\n\nThis is I think a theme in FRP systems: history is made tractable by limiting\ndynamism. From the work I’ve done with SAC systems, I think the usual approach\nin the FRP world is backwards: rather than start with a static system that\nsupports history and extend it with increased dynamism, I suspect it’s better to\nstart with a highly dynamic system like SAC and carefully extend it with ways of\nstatically adding history-sensitive computations. That said, it’s hard to be\nconfident about any of this, since this is a rather subtle area where no one has\nall the answers.\n",
        "url"      : "https://blog.janestreet.com/breaking-down-frp/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Async Parallel",
        "date"     : "January 9, 2014",
        "authorId" : "estokes",
        "author"   : "Eric Stokes",
        "tags"     : ["async","ocaml","parallel-programming"],
        "minsToRead" : 14,
        "content"  : "Background\n\nParallel is a library for spawning processes on a cluster of machines, and\npassing typed messages between them. The aim is to make using another processes\nas easy as possible. Parallel was built to take advantage of multicore computers\nin OCaml, which can’t use threads for parallelism due to it’s non reentrant\nruntime.\n\nIntroduction\n\nParallel is built on top of Async, an OCaml library that provides cooperative\nconcurrency. So what do we want an async interface to parallel computations to\nlook like? Well since a Deferred already captures the concept of some action\nstarting now and returning a future result, we just want a function to run that\naction in parallel, like this,\n\nval run : ?where:[`Local | `On of string | `F of (unit\n  -&gt; string)] -&gt; thunk:(unit -&gt; 'a Deferred.t)\n  -&gt; 'a Deferred.t\n\n\n\nSo what exactly does run do,\n\n\n  run creates a new process on the machine specified by [where]\n  run starts [thunk] in that process\n  run waits for [thunk] to finish and returns it’s result to the caller\n  [thunk] may also call run if it wants to\n\n\nThe above function is actually ALL we need, we can build anything we want on top\nof it. We could, for example, pass a closure that will return Deferred.never,\nand start up two way communication with that new process. For example via an\naddress we agree on before it is sent to the other process. In practice parallel\nprovides a few more things.\n\nval spawn : ?where:[`Local | `On of string | `F of (unit\n  -&gt; string)] -&gt; (('a, 'b) Hub.t -&gt; 'c Deferred.t)\n  -&gt; (('a, 'b) Channel.t * ('c, string) Result.t Deferred.t) Deferred.t\n\n\n\nSpawn is run’s more featureful cousin. The closure is started up on another\nprocess, and passed a typed Hub, and the caller is given the closure’s result,\nand a typed Channel that is connected to that Hub. A Channel is a connection to\na Hub. A Hub may have many channels connected to it at any given time. A very\nimportant feature of Channels is that they are mobile, they can be passed\nbetween processes, and they remain connected to their Hub. They can either be\npassed by lexical capture prior to spawn or run, or they can be sent over a\nChannel. Lets see a small example,\n\nPing Pong\n\nopen Core.Std\nopen Async.Std\nopen Parallel.Std\n\nlet worker h =\n  Pipe.iter_without_pushback (Hub.listen_simple h) ~f:(fun (id, `Ping) -&gt;\n    Hub.send h id `Pong;\n  &gt;&gt;| fun () -&gt; `Done\n\nlet main () =\n  Parallel.spawn ~where:Parallel.random worker &gt;&gt;&gt; fun (c, _res) -&gt;\n  let rec loop () =\n    Channel.write c `Ping;\n    Channel.read c &gt;&gt;&gt; fun `Pong -&gt;\n    Clock.after (sec 1.) &gt;&gt;&gt; loop\n  in\n  loop ();\n  Clock.after (sec 60.) &gt;&gt;&gt; fun () -&gt; Shutdown.shutdown 0\n\nlet () =\n  Parallel.init ~cluster:\n    {Cluster.master_machine = Unix.gethostname ();\n    worker_machines = [\"host0\"; \"host1\"]} ();\n  main ();\n  never_returns (Scheduler.go ())\n\n\n\nLets take this program apart. The function worker, is the worker process, which\nwill listen to it’s hub for Ping messages (id is the\n\nname of the client that sent the message) and respond to the sender with a\nPong message. Meanwhile, the main function will start up one process using\nspawn and have it run the worker function. It will then write a Ping message\nto that process every second, and read the returned Pong message. After 60\nseconds the main function will call shutdown. The toplevel action at the bottom\nfirst calls Parallel.init, and defines the cluster of three machines (the\nmaster, host0 and host1). Notice that main’s Parallel.spawn is given a ~where\nargument that will randomly pick one of the machines for the worker to run on.\nThen starts main, and then the async scheduler.\n\nThe most important thing to say about this program is that to the compiler it\nlooks just like any other program. So the type checker will check the types, and\nas a result it will enforce the protocol, to the extent possible. The other is\nthat when main dies, after the shutdown call, the framework will ensure that all\nthe worker processes (on all machines) are killed as well.\n\nImplementation Notes and Gotchas\n\nThere are three kinds of processes involved in a program the uses Parallel:\n\n\n  the main process\n  the master process\n  worker processes\n\n\nParallel dynamically creates a worker process to service each call to [run].\n\nThe OS process tree looks like:\n\n| main\n|    master\n|      worker1\n|      ...\n|      workerN\n\n\n\nAs far as the OS is concerned, all workers are children of the master. However,\nfrom the perspective of Parallel, the topology is more structured. Each worker\nprocess is created on behalf of its “owner” process, which is either the main\nprocess or another worker process. One can think of the main and worker\nprocesses as arranged in a tree different than the OS tree, in which there is an\nedge from each process to its owner (the main process has no owner).\n\nParallel uses OCaml’s [Marshal] library to serialize OCaml values to and from\nstrings so that they can be sent over unix sockets between processes. For\nexample, the [f] supplied to [run] is marshalled and sent from the process\nthat calls [run] to the worker process that will run [f]. Most, but not all\nvalues can be marshaled. Examples of values that can’t be marshaled include C\nallocated abstract tagged values, and custom blocks with no serilize/deserialize\nmethod.\n\nThe main process and all worker processes have a socket connected to the master\nprocess. The master process’s sole job is to service requests that are sent to\nthese sockets, which can ask it to create a new worker process. As the master\nprocess receives requests, it does what each request asks, and then sends a\nresponse back via the socket to the client that made the request.\n\nEach worker process has a socket connected to its owner process. This socket\ninitially receives the [f] that the worker is to run, and is ultimately used\nto send the result back from the worker to the owner.\n\nHere are the steps involved in implementing [run f]. There are three processes\ninvolved.\n\n\n  R = the process calling [run]\n  M = the master process\n  W = the worker process running the task\n\n\nThe steps are:\n\n\n  R asks M to create W\n  M forks W\n  M tells R about W\n  R sends [f] to W to run\n  W runs [f]\n  W sends the result of [f] to R\n  M notices W has exited, and cleans up\n\n\nWhen there are multiple machines in a cluster, each machine has a master\nprocess, and all the workers know about all master processes. When a worker\nwants to run on machine M, it looks up the address of that machine’s master\nprocess in its table before performing step 1, everything after that is exactly\nthe same as the example.\n\nChannel Passing\n\nWhen a channel is passed from one process to another, the open socket is not\nactually passed. The API makes this pretty transparant, any api call will\nreconnect the channel, but it is useful to be aware of what is really going on,\nas if you aren’t aware you may create a race condition. For example, if I spawn\na worker connected to a hub I have, and then I immediately send something, it\nmay or may not arrive, because the worker may not have time to connect and\nrecieve it. A better strategy is to wait for the worker to say hello, and then\nsend the data.\n\nChannel passing also means that though you created only one channel from a given\nhub, you can end up with as many connections (client ids) as workers who got\nhold of that channel. You can address them all individually, or you can always\nuse send_to_all if you really want to model a hub as a kind of shared bus.\n\nStdout and Stderr\n\nstdout and stderr will be forwarded back to the master machine. This can cause\nsome interleaving if you print a lot of messages, but generally works reasonably\nwell. So printf debugging can work normally, even in a parallel program spanning\nmultiple machines.\n\nSome things to avoid marshaling\n\nMonitor.t, Pcre.regexp, Writer.t, Reader.t, and similar kinds of objects\nshouldn’t be depended upon to marshal correctly. Pcre.regexp is a custom block,\nand doesn’t implement marshal/unmarshal, so it won’t work. Monitor.t, Writer.t,\nand Reader.t, because of their complex nature, generally tow the entire async\nscheduler along with them, and because of that they will fail if any job on the\nscheduler queue has a custom object (e.g. regexp, or other C object) that can’t\nbe marshaled. You also can’t marshal functions you’ve dynamically loaded (e.g.\nwith ocaml plugin, though I hear this will be fixed soonish).\n\nProcesses don’t share memory!\n\nThe library can make it look easy to create and use a process on some other\nmachine maybe halfway around the world, but even still it is another process.\nAll the normal boundries associated with that apply, so you can’t expect global\nvariables you set in one worker process to effect another. For a large parallel\nprogram that is a good thing.\n\nShared things\n\nBecause of the way parallel works, with the master process an image of a very\nearly state of one’s program and workers forked from the master, it is usually\nnot possible to share big static things in the way one might do in C using fork.\nMoreover, it isn’t necessarially a win as you might think, if you know about how\nunix only copies pages on write when a process forks, you know that it should be\na win. But the garbage collector ruins that completely, because as it scans it\nwill write to EVERY page, causing a copy on write fault to copy the page, so\nyou’ll end up with a non shared copy of that big static thing in every process\nanyway. The best you can probably do is have one process own it and expose it\nwith a query interface. Moreover, if you’re running on multiple machines that IS\nthe best you can do, so you may as well get used to it.\n\nWhy Not Just Fork!?\n\nThe unix savvy amoung you may ask, what the heck are you doing with master\nprocesses and closure passing, just fork! Oh how that would make life easier,\nbut alas, it really isn’t possible. Why? You can’t write async without threads,\nbecause the Unix API doesn’t provide an asynchronous system call for every\noperation, meaning if you need to do something that might block, you must do it\nin a thread. And the list of stuff that might block is long and crippling. Want\nto read from a file without blocking out for SECONDS? Sorry! Not without a\nthread you don’t. But once you’ve started a thread, all bets are off if you\nfork. POSIX actually doesn’t even say anything about what happens to threads in\na process that forks (besides saying they don’t think its a good idea to do so).\nIn every sane OS, only the forking thread continues in the child, all the other\nthreads are dead. OK, fine you say, let them die. But their mutexes, semephores\nand condition variables are in whatever state they were in the moment you\nforked, that is to say, any state at all. Unfortunatly this means that having\ncreated a thread that does anything meaningful (e.g. calls into libc), if you\nfork, all bets are off as to what happens in that child process. A dead thread\nmay, for example, hold the lock around the C heap, which would mean that any\ncall into libc would deadlock trying to allocate memory (oops), that’d ruin your\nday. Trust me, if parallel could work in some simpler way we’d adopt it quickly!\n\nSay I Want To Have a Giant Shared Matrix\n\nThe parallelism model implemented is strictly message passing, shared memory\nisn’t implemented, but there are various hacks you could use to make this work\n(e.g. implement it yourself). Bigarray already allows mmaping files, so in\ntheory even a cluster of machines could all mmap a giant file and use/mutate it.\n\nWhy Can’t I Use Async Before Parallel.init?\n\nBy default Parallel.init does a check that you haven’t created any threads, and\nthat you haven’t made any use of async. The threads check is mandatory, but the\nasync check can be turned off by setting\n[fail_if_async_has_been_initialized] to false. Why is this check the\ndefault? Well in general you can’t initialize async libraries before calling\nParallel.init and expect them to work in the child process. The reason is that\nthe async scheduler is thrown away in the child process before calling\nScheduler.go. This is out of necessity, there is no way we can know what state\nthe scheduler is in at the moment we fork, and it would be quite unfortunate if\nit were in a bad state, or worse, there are jobs on the queue that get run in\nall workers as soon as they call Scheduler.go. But as a result of this, any\nasyncy thing you create before Parallel.init won’t work in worker processes. For\nexample, say you initialize the log module before Parallel.init expecting to use\nit in the workers. It won’t work, since all of its state (loops, writers, etc)\nis invalid in the worker processes. The check exists to make sure people are\naware of this, and to make sure it only happens if they really know it’s ok.\n\nWhat CWD Will Worker Machine Processes Have?\n\nIf the CWD of the master exists on the worker machine, and you have permission\nto enter it, then parallel will switch to that directory before starting the\nmaster process, otherwise it will chdir to /.\n",
        "url"      : "https://blog.janestreet.com/async-parallel/",
        "image"    : null,
        "topic"    :  ["technology","async","ocaml","parallel-programming"] ,
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "10 tips for writing comments (plus one more)",
        "date"     : "December 20, 2013",
        "authorId" : "cfalls",
        "author"   : "Craig Falls",
        "tags"     : ["comments","core"],
        "minsToRead" : 5,
        "content"  : "A few words about what we’re striving for in our comments, particularly in Core.\nEvery shop has their own commenting style, so I worry that if you find our\ncomments strange or lacking in some way you may just think, “Gee, that’s a weird\na way to comment your code, but I guess that’s how they like to do it.” Now\nyou’ll know our intent, so please do point out where our execution differs.\n\nYou might also find this useful if you want to contribute to Core.\n\nIf you’re straight out of university, you might not have worked on large code\nbases with others enough to have even thought about these questions, and you\nmight find this intersting on those grounds. For whatever reason, it’s one of\nthose things you’re usually just expected to pick up on the job.\n\nAnyway, here are our basic thoughts. They’re always evolving, and have until now\nmostly just been an oral tradition, but do reflect a rough consensus arrived at\nafter a lot of thinking and experience.\n\n\n  \n    Readers usually want more high-level comments and fewer low-level comments\nthan you think. A good example is that users love that one-sentence comment\nat the top of a module that makes the whole thing click in place. For\nexample, if the top of Flat_array says “This is basically like a regular\nArray of tuples, except the tuples aren’t boxed. The slots in the tuple might\nbe pointers but the tuples in the Array are inlined into the Array. This\nmeans that copying has to occur when you pull a tuple out of the Array, but\nnot when you just pull a slot out of a tuple in the Array.” then the whole\nrest of the module is obvious, and will barely need comments.\n  \n  \n    Avoid redundant comments, e.g.:\n\n    (* [remove t elt] removes [elt] from [t]. *)\n\n(* [remove_top t] remove the top element from [t] *)\n\n(* [child pool t] return the child of [t] *)\nval child : 'a Pool.t -&gt; 'a t -&gt; 'a t\n\n    \n\n    Generally, try not to say things that are obvious. Imagine that an actual\nperson asked you, “Hey, what’s this child function?” and dictate your answer\ninto the comment. If your answer is just, “Ok, you didn’t read the comment at\nthe top of the module, did you?” then skip it.\n  \n  \n    Put comments for users of a module in the mli and comments for the changers\nof a module in the ml.\n  \n  \n    When you make a feature\n(see this earlier post on\nfeature-based review), make sure the comments are updated on any modules you\nchanged. If you didn’t update the comment, it suggests you didn’t read it,\nwhich suggests you didn’t expect it to be useful. If in fact it isn’t, just\nremove it. An incorrect comment is worse than no comment.\n  \n  \n    Give more external context and less internal details. The internal stuff is\nalready in the code, while the external stuff would otherwise be left\nunspecified. For example, internally we have an Mpv for rounding prices to\nsomething exchanges will accept. (For example, the “minimum price variation”\nor “tick size” for most US stocks is one penny, but for low-priced stocks it\ncan be smaller.) This module is where new hires will end up if they don’t yet\nknow about this, so it’s a good place to explain how it works. It’s not so\nnecessary to explain the details of the module itself – it’s pretty straight\nforward once you have the context.\n  \n  \n    After you’ve written a good comment, reconsider the code and see if you can\nobviate the comment. A common pattern is that:\n\n    (* x is a y *)\nval x : t\n\n    \n\n    can often be better written:\n\n    val y : t\n\n    \n\n    If written in the second form you don’t feel like you have to explain that\ny is an x, then the code has been improved.\n  \n  \n    As a general rule, a comment that required thought to create saves the reader\nmore thought than a comment that was easy to generate.\n  \n  \n    Often, it’s more important to say what’s bad about the code than what’s good.\nWhat are the caveats? When can it fail? What cases aren’t handled?\n  \n  \n    Prevent obvious improvements that are actually disimprovements. For example,\nif something could be refactored to be clearer, shorter, and slower, add a\n(* performance hack *) comment. With just those two words, you clarify that\nthe reader isn’t missing something, the code really is redundant, and also\nthat it shouldn’t be “fixed” in the author’s opinion.\n  \n  \n    Give the intuition, not the proof. Examples are often good – you could\ncomment a fibonacci generator with (* 1, 1, 2, 3, 5, 8, 13, ... *). Or a\nlink to Wikipedia. Too much (* For all n &gt; 0, [f(n)] = [f( ... tends to be\nno better than just reading the code. If the comment and the code explain\nthe same thing two ways, it adds more if they’re two very different ways.\n  \n  \n    Don’t be too rigid in following these rules, or any other rules. Comments\nare for humans, and humans are complicated, so the best way to explain your\nideas will depend on lots of things. Hard rules like “every function needs a\ncomment” do more harm than good.\n  \n\n",
        "url"      : "https://blog.janestreet.com/10-tips-for-writing-comments-plus-one-more/",
        "image"    : null,
        "topic"    :  ["technology","comments","core"] ,
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "A module type equivalence surprise",
        "date"     : "December 11, 2013",
        "authorId" : "sweeks",
        "author"   : "Stephen Weeks",
        "tags"     : [],
        "minsToRead" : 2,
        "content"  : "I usually think of two module types S1 and S2 as being equivalent if the\nfollowing two functors type check:\n\nmodule F12 (M : S1) = (M : S2)\nmodule F21 (M : S2) = (M : S1)\n\n\n\nAnd by equivalent, I mean indisinguishable – one should be able to use S1\nanywhere one uses S2, and vice versa, with exactly the same type checker and\nsemantic behavior.\n\nHowever, I found an an example today with two module types that are equivalent,\nboth in my internal mental model of equivalence and in my formal definition that\nF12 and F21 type check, but that one can distinguish using module type of.\n Here are the module types:\n\nmodule type S1 = sig module N : sig type t end type u = N.t end\n\n    module type S2 = sig\n      type u\n      module N : sig\n        type t = u\n      end\n    end\n\n    module F12 (M : S1) = (M : S2)\n    module F21 (M : S2) = (M : S1)\n\n\n\nAnd here is a context that distinguishes them: F1 type checks, but F2 does\nnot:\n\nmodule F1 (A : S1) = struct\n  module F (B : sig type t end) = (B : module type of A.N)\nend\nmodule F2 (A : S2) = struct\n  module F (B : sig type t end) = (B : module type of A.N)\nend\n\n\n\nWhat’s going on is that in F1, module type of A.N decides to abstract t,\nbecause it doesn’t have a definition. But in F2, module type of A.N does not\nabstract t, because it is defined to be u.\n\nSince I thought of S1 and S2 as equivalent, I would have preferred that\nmodule type of not abstract t in both cases, and thus that both F1 and\nF2 be rejected. But I don’t see anything unsound about what OCaml is doing.\n",
        "url"      : "https://blog.janestreet.com/a-module-type-equivalence-surprise/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "RWO tidbits: the runtime",
        "date"     : "December 8, 2013",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : ["ocaml","real-world-ocaml"],
        "minsToRead" : 1,
        "content"  : "This is my favorite tweet about Real World OCaml.\n\n\n  Real World OCaml devotes an entire section of 5 chapters to the runtime\nsystem. That’s 5 chapters more than most language books. Refreshing\n\n  — keyist (@keyist)\n\n  November 18, 2013\n\n\nIt is indeed pretty rare for a language introduction to spend this much time on\nthe runtime. One reason we included it in RWO is that OCaml’s simple and\nefficient runtime is one of its real strengths – it makes OCaml simple to\nreason about from a performance perspective, and simple to use in a wide variety\nof contexts. Also, critically, the runtime is also simple enough to explain!\nEven though it’s one of my favorite parts of the book, I had very little to do\nwith it. Anil wrote most of it, with some critical help from Jeremy Yallop (who\nworked on the ctypes library featured\nin\nChapter 22,\nand Stephen Weeks (whose notes formed the basis\nof\nChapter 23 and\nChapter 24).\n\nIn any case, if you’re interested in how OCaml represents values, how the C\ninterface works, or how a simple generational GC works, you should check out\nPart III.\n",
        "url"      : "https://blog.janestreet.com/rwo-tidbits-the-runtime/",
        "image"    : null,
        "topic"    :  ["technology","ocaml","real-world-ocaml"] ,
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "RWO tidbits: Benign effects",
        "date"     : "December 1, 2013",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 0,
        "content"  : "Now that Real World OCaml is out, I thought it\nwould be fun to have a series of posts highlighting different interesting\ncorners of the book.\n\nToday: the section on benign\neffects.\n\nBenign effects are uses of imperative programming that by and large preserve\nthe functional nature of the code you write. Typically, benign effects are used\nto improve performance without changing the semantics of your code. I think\nthere are a lot of fun and interesting ideas here, my favorite one being an\nelegant approach to dynamic programming, as exemplified by a simple\nimplementation of a function for computing the edit distance between two\nstrings.\n\nAnd of course, if you enjoy the book, you can get a hardcopy on\nAmazon.\n\n\n",
        "url"      : "https://blog.janestreet.com/rwo-tidbits-benign-effects/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "The making of Real World OCaml",
        "date"     : "November 11, 2013",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : ["book","ocaml"],
        "minsToRead" : 4,
        "content"  : "\n\nIt’s taken a good long while, but Real World OCaml is finally done. You can read\nit for free at http://realworldocaml.org, or buy a\nhardcopy or an ebook\nversion.\n\nThe idea for the book was born in a bar in Tokyo in 2011. After a talk-filled\nday at ICFP, a few of us, Ashish Agarwal and Marius Eriksen, Anil, and myself,\nwent out drinking. We were all bellyaching over the lack of a high-quality OCaml\nbook in English, and, emboldened by the Guinness and egged on by Ashish and\nMarius, Anil and I decided that we were just going to have to go ahead and write\nthe book ourselves. We brought Jason into the project soon after.\n\nFrom the very beginning, Anil and I had different visions for the book. From my\nperspective, it was a simple proposition: there was no good book available, and\nwe were going to write one. The goal was to document the state of the art so as\nto make it more accessible.\n\nAnil’s vision was grander. He argued that writing a book is an opportunity to\ndiscover all of the things that are just too embarrassing to explain. And, once\ndiscovered, you needed to find a way to get all of those things fixed before the\nbook was published.  It was Anil’s grander vision that we followed,\nand to good effect. It’s not that we fixed the problems ourselves. We solved\nsome problems directly, but by and large, we ended up getting help from those\nwho were better positioned than us to fix the problems at hand.\n\nHere are a few examples of pieces that came together over those two years.\n\n\n  OPAM. The core work on OPAM was done by Thomas Gazagnaire at OCamlPro\n(funded by Jane Street), with Anil collaborating closely. This is probably\nthe biggest improvement of the bunch. It’s hard to overstate how\ntransformational OPAM is. Before OPAM, installing OCaml libraries was a\ncomplex chore. Now it’s a joy.\n  The Core release process. We decided early on to base the book on Core,\nJane Street’s alternative to OCaml’s standard library. But the release\nprocess around Core was too monolithic and too hard for external\ncollaborators. We reorganized the process completely, moving to weekly\nreleases to github, with the repos broken out into manageable components.\nMost of this work was done by Jeremie Dimino and Yury Sulsky at Jane Street.\n  Ctypes. Anil was in charge of writing the section on C bindings, and he\ndecided that the standard way was just too painful to explain. That was the\ngerm of ctypes, a library, written by Jeremy Yallop, that lets you build C\nbindings entirely from within OCaml, with a easy and simple interface.\n  Short paths. One of the real pains associated with working with Core is\nthe heavy use that Core makes of the module system. OCaml’s error messages,\nunfortunately, do not cope with this terribly well, leading to absurd types\nin error messages, so you might see a type rendered as\nCore.Std.Int.Map.Key.t Core.Std.Option.Monad_infix where it could just as\nwell have been rendered as int option. We worked with Jacques Garrigue to\nimplement a better heuristic for picking type names (suggested by Stephen\nWeeks, who implemented the same heuristic for Mlton). This is now available\nin the 4.01 version of the compiler, via the -short-paths patch.\n\n\nThis period also saw the founding of OCaml Labs, a lab at Cambridge University\ndevoted to improving OCaml as a platform, with Anil at the head.\n\nAnd we’re not done. OCaml Labs and OCamlPro are still working on improving OPAM\nand building out a curated OCaml Platform that’s built on top of OPAM. There’s\nongoing work on improving documentation generation so we can have better online\nAPI docs. Jacques Garrigue and Leo White are working on adding module aliases to\nOCaml which will greatly improve compilation time for libraries like Core that\nneed to manage large namespaces.\n\nAnd there are more projects that are improving OCaml and its surrounding\ninfrastructure that have no direct connection to Real World OCaml, like:\n\n\n  Frederic Bour and Thomas Refis’ work on Merlin, a tool for providing\nIDE-like tooling from within editors like VI and Emacs.\n  Mark Shinwell’s work on improving GDB support for OCaml,\n  Fabrice le Fessant’s work improving OCaml’s interactions with performance\ntools like perf.\n  Work that Ashish Agarwal, Christophe Troestler, Esther Baruk, and now many\nothers, have poured into improving the http://ocaml.org website.\n\n\nReal World OCaml is of course a personal milestone for Jason, Anil and myself\n(and for Leo White, Jeremy Yallop and Stephen Weeks, who made some key\ncontributions to the text). But viewed from a broader perspective, it’s just one\npart of the increasing swirl of activity in the OCaml community.\n\nOCaml has been a great language for a very long time. Now, we’re growing a\nsocial and technological infrastructure around it to match.\n",
        "url"      : "https://blog.janestreet.com/the-making-of-real-world-ocaml/",
        "image"    : null,
        "topic"    :  ["technology","book","ocaml"] ,
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "What's 2013 + 50? 1969, of course!",
        "date"     : "September 11, 2013",
        "authorId" : "pmay",
        "author"   : "Pavel May",
        "tags"     : [],
        "minsToRead" : 7,
        "content"  : "What happens when the latest CentOS 6.4/RHEL/FreeBSD GnuTLS certtool gets used\nto generate a TLS certificate with a 18250-day validity period? Time travel back\nin time, is what. \n\n\n  Note: This applies to the CentOS-released GnuTLS v 2.8.5. Latest source\ndistribution is 3.2.4 Curiously enough, even in FreeBSD (by way of a\ncounterpoint), gnutls “stable” is 2.12.23, and devel is 2.99.4_1.\nProfessionals call this sort of a thing a “hint”. FreeBSD’s 2.12.23 also has\nthe described behavior. FreeBSD’s 2.99.4_1 cannot be downloaded via the usual\n“portinstall” mechanism – it has a known security vulnerability which hasn’t\nbeen patched and portaudit does its best impersonation of The Grumpy Cat meme\nand says “No”.\n\n\nSo, you’d like to use CentOS’ certtool to create a self-signed certificate?\nSure, no problem.\n\nFirst step, create a skeleton certificate authority.\n\n$ rpm -qf which certtool gnutls-utils-2.8.5-10.el6_4.2.x86_64\n$ uname -a; cat /etc/redhat-release Linux buildhost 3.9.5pm1 #3 SMP PREEMPT Thu Jun 13 11:20:29 EDT 2013 x86_64 x86_64 x86_64 GNU/Linux CentOS release 6.4 (Final)\n\n\n\nFirst, CA private key:\n\n$ certtool -p --outfile ca-temp-key.pem\nGenerating a 2048 bit RSA private key...\n\n\n\nAll good.\n\nNext, CA signing certificate:\n\n$ certtool -s --load-privkey ca-temp-key.pem --outfile ca-test-signing.pem\nGenerating a self signed certificate...\nPlease enter the details of the certificate's distinguished name. Just press enter to ignore a field.\nCountry name (2 chars): US\nOrganization name: Jane Street\nOrganizational unit name: Systems\nLocality name: US\nState or province name: NY\nCommon name: test-ca.janestreet.com\nUID:\nThis field should not be used in new certificates.\nE-mail:\nEnter the certificate's serial number in decimal (default: 1378834082):\n\nActivation/Expiration time. The certificate will expire in (days): 18250\n\nExtensions.\nDoes the certificate belong to an authority? (y/N): y\nPath length constraint (decimal, -1 for no constraint):\nIs this a TLS web client certificate? (y/N):\nIs this also a TLS web server certificate? (y/N):\nEnter the e-mail of the subject of the certificate:\nWill the certificate be used to sign other certificates? (y/N): y\nWill the certificate be used to sign CRLs? (y/N):\nWill the certificate be used to sign code? (y/N):\nWill the certificate be used to sign OCSP requests? (y/N):\nWill the certificate be used for time stamping? (y/N):\nEnter the URI of the CRL distribution point:\nX.509 Certificate Information:\n        Version: 3\n        Serial Number (hex): 522f56a2\n        Validity:\n                Not Before: Tue Sep 10 17:28:04 UTC 2013\n                Not After: Wed Dec 31 23:59:59 UTC 1969\n        Subject: C=US,O=Jane Street,OU=Systems,L=US,ST=NY,CN=test-ca.janestreet.com\n        Subject Public Key Algorithm: RSA\n                Modulus (bits 2048):\n                        ce:cb:49:2c:3d:a2:e2:97:6f:71:df:43:e1:fa:b1:14\n                        1e:b1:e5:51:13:1c:cc:7c:18:38:29:bf:08:70:f1:35\n                        d9:5d:ad:51:dc:0e:9d:f9:e6:ec:53:20:b0:04:fe:cb\n                        0e:a6:45:27:c0:f2:cc:34:45:fd:97:2c:11:b7:86:e9\n                        8f:9f:58:fa:90:ac:e7:9f:4e:a0:7f:8e:eb:5b:6f:15\n                        17:8d:82:a1:30:cf:3f:37:a8:44:6a:1d:2e:3b:69:36\n                        3e:34:c5:2a:f3:d2:2b:1f:81:ec:25:81:76:0e:1d:b9\n                        7f:12:23:a2:af:b7:e5:9b:f7:f6:be:c4:23:65:f1:4a\n                        63:fc:ec:92:5b:fc:f0:2c:6b:80:ee:fb:54:bf:7f:16\n                        33:b8:26:e5:d4:f4:ec:86:18:26:3e:31:5f:66:cf:0c\n                        81:cd:ef:c2:ec:ad:fc:26:07:2d:67:94:de:98:c2:32\n                        d4:6e:59:31:6a:35:1d:db:19:b4:a5:27:6b:94:be:8a\n                        77:2f:8c:7c:6b:cb:af:71:62:fa:7a:41:e5:da:63:5b\n                        95:d1:05:62:56:33:07:67:8c:bf:3f:64:11:dc:84:69\n                        e6:f2:b7:f2:6c:a0:e1:36:fc:e3:00:c0:11:26:dd:44\n                        f0:ca:02:97:67:70:15:85:34:e9:ca:d6:60:a4:37:8b\n                Exponent (bits 24):\n                        01:00:01\n        Extensions:\n                Basic Constraints (critical):\n                        Certificate Authority (CA): TRUE\n                Key Usage (critical):\n                        Certificate signing.\n                Subject Key Identifier (not critical):\n                        d7dfcb520769255a65638e6dc3b899648dd4e447\nOther Information:\n        Public Key Id:\n                d7dfcb520769255a65638e6dc3b899648dd4e447\n\n\n\nAnd here’s the crux of the issue:\n\nValidity:\n                Not Before: Tue Sep 10 17:28:04 UTC 2013\n                Not After: Wed Dec 31 23:59:59 UTC 1969\n\n\n\nNot after 1969? Yeah… Let me get my flux capacitor and a DeLorean and get back\nto you.\n\nFreeBSD’s 2.12.3:\n\n$ certtool -p --outfile ca-temp-key.pem\nGenerating a 2432 bit RSA private key...\n$certtool -s --load-privkey=ca-temp-key.pem --outfile ca-test-signing.pem\nGenerating a self signed certificate...\nPlease enter the details of the certificate's distinguished name. Just press enter to ignore a field.\nCountry name (2 chars): US\nOrganization name: Jane Street\nOrganizational unit name: Systems\nLocality name: New York\nState or province name: NY\nCommon name: test-ca.janestreet.com\nUID:\nThis field should not be used in new certificates.\nE-mail:\nEnter the certificate's serial number in decimal (default: 1378834456):\n\nActivation/Expiration time.\nThe certificate will expire in (days): 18250\n\nExtensions.\nDoes the certificate belong to an authority? (y/N): y\nPath length constraint (decimal, -1 for no constraint):\nIs this a TLS web client certificate? (y/N):\nWill the certificate be used for IPsec IKE operations? (y/N):\nIs this also a TLS web server certificate? (y/N):\nEnter the e-mail of the subject of the certificate:\nWill the certificate be used to sign other certificates? (y/N): y\nWill the certificate be used to sign CRLs? (y/N):\nWill the certificate be used to sign code? (y/N):\nWill the certificate be used to sign OCSP requests? (y/N):\nWill the certificate be used for time stamping? (y/N):\nEnter the URI of the CRL distribution point:\nX.509 Certificate Information:\n    Version: 3\n    Serial Number (hex): 522f5818\n    Validity:\n        Not Before: Tue Sep 10 17:34:17 UTC 2013\n        Not After: Thu Jan 01 00:00:00 UTC 1970\n    Subject: C=US,O=Jane Street,OU=Systems,L=New York,ST=NY,CN=test-ca.janestreet.com\n    Subject Public Key Algorithm: RSA\n    Certificate Security Level: Normal\n        Modulus (bits 2432):\n            00:ea:3e:bf:c2:bb:55:90:4f:e1:d3:da:2b:3e:b2:81\n            64:97:8f:db:70:27:ad:94:ae:1d:dd:ab:28:73:6e:60\n            2a:39:8a:c0:1b:2c:ae:1e:f7:ce:c5:dc:01:8a:9e:31\n            15:e3:e5:9c:67:63:05:ec:24:6b:0c:74:7d:6b:ae:bc\n            ba:8b:4c:fd:b8:2b:37:74:f1:10:39:a1:c7:f3:fb:dc\n            b8:09:80:2f:a5:8b:79:13:66:e0:8b:93:56:3b:3b:dd\n            fb:6d:78:49:cf:c6:5c:57:f0:5d:1f:2d:73:98:b2:eb\n            1e:10:be:0e:e7:de:2b:9b:d2:88:e0:49:34:a9:30:28\n            ad:4c:60:8c:11:50:bb:25:c2:e5:88:0a:4d:6a:84:a9\n            48:2e:07:ed:dc:e0:04:9c:bd:90:2b:fb:10:92:ca:8d\n            cc:51:4f:f8:fa:d2:51:a4:12:50:75:e6:e5:87:f2:67\n            5f:17:4e:12:63:4c:aa:70:2e:20:b9:07:63:1d:41:89\n            f4:f7:7f:c7:91:55:05:49:94:ff:7f:1b:dc:23:59:08\n            15:c0:9f:13:c7:90:bf:c0:c1:8f:02:9b:6f:28:71:e4\n            1e:90:0b:1f:7b:f6:4b:1a:2d:1f:24:d4:d4:6d:11:3a\n            3d:e2:7e:41:d1:0d:1c:88:da:db:29:5a:1d:4d:62:c3\n            ac:c6:dc:2c:e9:d9:7d:3d:fc:af:3a:10:fe:3a:b7:bc\n            8a:f1:ed:9b:85:89:b6:e2:e8:0c:36:df:55:c6:60:7a\n            1c:1c:3d:54:7f:d7:d5:ea:1c:0d:d1:0c:c6:ef:99:cf\n            5d\n        Exponent (bits 24):\n            01:00:01\n    Extensions:\n        Basic Constraints (critical):\n            Certificate Authority (CA): TRUE\n        Key Usage (critical):\n            Certificate signing.\n        Subject Key Identifier (not critical):\n            6ec09c8592ba3904a301051b60223a5e50cad333\nOther Information:\n    Public Key Id:\n        6ec09c8592ba3904a301051b60223a5e50cad333\nIs the above information ok? (y/N): n\n\n\n\nGnuTLS 3.2.4 (compiled from source):\n\ngnutls-3.2.4/src$ ./certtool -p --outfile ca-temp-key.pem\nGenerating a 2432 bit RSA private key...\n\ngnutls-3.2.4/src$ ./certtool -s --load-privkey=ca-temp-key.pem --outfile ca-test-signing.pem\nGenerating a self signed certificate...\nPlease enter the details of the certificate's distinguished name. Just press enter to ignore a field.\nCommon name: test-ca.janestreet.com\nUID:\nOrganizational unit name: Systems\nOrganization name: Jane Street\nLocality name: New York\nState or province name: NY\nCountry name (2 chars): US\nEnter the subject's domain component (DC): janestreet.com\nEnter the subject's domain component (DC):\nThis field should not be used in new certificates.\nE-mail:\nEnter the certificate's serial number in decimal (default: 1378836930):\n\n\n  Activation/Expiration time.\n  The certificate will expire in (days): 18250\n\n  Extensions.\n  Does the certificate belong to an authority? (y/N): y\n  Path length constraint (decimal, -1 for no constraint):\n  Is this a TLS web client certificate? (y/N):\n  Will the certificate be used for IPsec IKE operations? (y/N):\n  Is this a TLS web server certificate? (y/N):\n  Enter a dnsName of the subject of the certificate:\n  Enter a URI of the subject of the certificate:\n  Enter the IP address of the subject of the certificate:\n  Enter the e-mail of the subject of the certificate:\n  Will the certificate be used to sign other certificates? (y/N): y\n  Will the certificate be used to sign CRLs? (y/N):\n  Will the certificate be used to sign code? (y/N):\n  Will the certificate be used to sign OCSP requests? (y/N):\n  Will the certificate be used for time stamping? (y/N):\n  Enter the URI of the CRL distribution point:\n  X.509 Certificate Information:\n      Version: 3\n      Serial Number (hex): 522f61c2\n      Validity:\n          Not Before: Tue Sep 10 18:15:31 UTC 2013\n          Not After: Wed Aug 29 18:15:31 UTC 2063\n      Subject: CN=test-ca.janestreet.com,OU=Systems,O=Jane Street,L=New York,ST=NY,C=US,DC=janestreet.com\n      Subject Public Key Algorithm: RSA\n      Algorithm Security Level: Normal (2432 bits)\n          Modulus (bits 2432):\n              00:b9:f0:d3:81:b1:d6:09:71:45:47:e6:66:ac:41:0b\n              93:93:b3:68:28:60:08:5e:e4:ba:9e:43:5f:b5:05:55\n              24:f0:34:ab:11:8a:fe:74:9e:d2:f8:e4:ab:c6:5c:f3\n              2c:f9:0b:b4:4c:26:b9:3d:58:3b:16:73:85:28:95:13\n              ec:7d:7c:8b:38:c8:fa:08:64:de:5e:f5:9a:f5:70:1c\n              cb:d4:d0:4a:e7:ad:5b:20:89:cc:29:91:c0:58:3b:dd\n              38:f8:6f:56:f5:9b:25:05:44:ae:f9:9d:67:0b:59:96\n              b7:da:4c:24:37:84:a5:f6:8f:32:5b:ae:e3:e8:ac:d2\n              1b:7d:b4:67:42:f7:60:95:30:e4:8e:fa:4d:db:5b:65\n              4f:f3:04:ca:94:74:d0:b2:42:20:8f:be:22:1b:77:34\n              34:00:7d:0f:1a:7f:33:5a:56:b7:c6:88:9b:68:5b:7d\n              84:d6:c4:c2:3e:8a:b5:40:6e:35:64:10:46:b1:28:ac\n              8c:1f:2c:55:98:14:96:9c:e9:17:93:d3:28:30:04:8e\n              7d:9e:ae:55:77:13:c5:7b:1b:cd:e1:d9:85:62:66:ad\n              64:14:11:f3:2a:a4:f2:9a:88:36:d7:b9:7d:3f:c7:8f\n              45:7c:b9:7d:11:73:da:c3:36:5e:12:e3:8a:8f:94:c1\n              4e:33:be:e6:2c:49:d4:cf:39:d8:38:7c:fd:c5:7d:06\n              1d:2d:87:8e:ea:7e:80:f7:aa:25:bf:e8:a7:0f:17:c7\n              12:e7:21:05:aa:3a:0c:9a:a8:1c:86:98:fc:ea:30:40\n              29\n          Exponent (bits 24):\n              01:00:01\n      Extensions:\n          Basic Constraints (critical):\n              Certificate Authority (CA): TRUE\n          Key Usage (critical):\n              Certificate signing.\n          Subject Key Identifier (not critical):\n              683cf71bc67af324655d661ecd7043a0707e3ee7\n  Other Information:\n      Public Key Id:\n          683cf71bc67af324655d661ecd7043a0707e3ee7\n      Public key's random art:\n          +--[ RSA 2432]----+\n          |         . . .+o.|\n          |          + .  +o|\n          |           o . .*|\n          |     . .    o. =.|\n          |      = S   oo...|\n          |     . o o o  +  |\n          |          * .  E |\n          |         oo=     |\n          |        ...o.    |\n          +-----------------+\n  Is the above information ok? (y/N):\n\n\n\n\nLovely, that.\n\nOn CentOS 6.4, rpm --whatdepends and rpm --requires show very few (at least\nin our general install) direct dependencies.\n\nParallel to that, it is not certain why GnuTLS in CentOS/RHEL and FreeBSD (there\nis also evidence that Debian and Ubuntu are in a similar version paths) use an\nold(er) version of GnuTLS. There is a distinct possibility of an ABI change\nsince there is a major version number jump.\n",
        "url"      : "https://blog.janestreet.com/whats-2013-50-1969-of-course/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Patch review vs diff review, revisited",
        "date"     : "May 3, 2013",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : ["code-review","hg","ocaml"],
        "minsToRead" : 10,
        "content"  : "I’ve been thinking about code review a lot recently.\n\nCode review is a key part of our dev process, and has been from the beginning.\nFrom our perspective, code is more than a way of getting things done, it’s a way\nof expressing your intent. Code that’s easy to read and understand is likely to\nbe more robust, more flexible, and critically, safer. And we care about safety a\nlot, for obvious reasons.\n\nBut the importance of code review doesn’t mean that we’ve always done a good job\nof organizing it. I’ll talk a bit more about how we used to do code review, how\nwe do it now, and the impact that those changes have had. \n\nThe bad old world\n\nOur old code review process was what you might call batch-oriented. We’d\nprepare a set of changes for a release, and then, after someone gave it a quick\nlook-over, combine these changes together in a branch. We’d then read over these\nchanges very carefully, with multiple people reading each file, making comments,\nrequesting changes, and fixing those changes, until the code was in a releasable\nstate.\n\nThis was a big and expensive process, involving many people, and quite a lot of\nwork and coordination. Given the time it took, we focused our code review on our\nso-called critical path systems, i.e., the ones that are involved in sending\norders to the market.\n\nThe management task was complex enough that we wrote a tool called cr for\nmanaging and tracking the reading of these diffs, parceling out responsibility\nfor different files to different people. We’ve actually blogged about this\nbefore, here and\nhere.\n\nBatch-oriented review worked well when we and our codebase were smaller, but it\ndid not scale. By combining multiple changes into a single branch, you were\nstuck reading a collection of unrelated changes, and the breadth of the changes\nmade fitting it all into your head harder. Even worse, when you throw a bunch of\nchanges together, some are going to take longer than others, so the release is\nblocked until the slowest change gets beaten into shape.\n\nThe end result is that, while we found code review to be indispensable in\ncreating high quality code that we felt comfortable connecting to the markets,\nthe overhead of that review kept on getting higher and higher as we grew.\n\nWe needed a better solution.\n\nFeature-based review\n\nAnother approach to code review, and a more common one, is patch-based review.\nIn patch review, someone proposes a change to the current release in the form of\na patch, and it is the patch itself that is reviewed. Once it passes review, it\ncan be applied to the tree. Patch-based review is great in that it gives you\nindependence: one patch taking a while doesn’t block other patches from getting\nout.\n\nWe avoided patch-based review initially because we were worried about the\ncomplexities of dealing with conflicting patches. Indeed, one issue with\npatch-based review is that the state of the tree when the patch is reviewed is\nlikely not the state of the tree when the patch is applied. Even when this\ndoesn’t lead to textual conflicts, this should leave you a little nervous, since\na patch that is correct against one version of the tree is not necessarily\ncorrect against a changed tree.\n\nAnd then, what do you do when there’s a conflict and the patch no longer applies\ncleanly? You can rebase the patch against the new state of the tree, and then\nre-review the patch from scratch. But humans are really bad at investing mental\nenergy in boring work, and carefully re-reviewing a patch you’ve already mostly\nread is deadly dull.\n\nMoreover, when do you decide that there’s a conflict? When dealing with patches\nthat involve file moves and renames, even deciding what it means for a patch\nwritten previously to still apply cleanly is a tricky question.\n\nAlso, squaring patch-based review with a git or hg-based workflow can be tricky.\nThere’s something quite nice about the github-style pull-request workflow; but\nthe semantics of merging are pretty tricky, and you need to be careful that what\nyou read corresponds with the actual changes that are made by the merge.\n\nFor all the problems, the virtues of patch-based review are clear, and so about\nsix months ago we started a project to revamp our cr tool to make it suitable\nfor doing patch-like review. The new version of cr is now organized around\nwhat we call features, which are essentially hg bookmarks (similar to git\nbranches) augmented with some associated metadata. This metadata includes\n\n\n  An English description of the change\n  A base-revision that the changes should be read against\n  An owner\n  A collection (usually just one other than the owner) of full-feature\nreviewers.\n\n\nThe workflow for a developer goes something like this:\n\n\n  create a new feature by running cr feature create. You’ll select a\nname for the feature and write the initial description. The base-revision\nwill automatically be chosen as the most recent release.\n  Write the code, using hg in the ordinary way, making commits as you go\nand pushing the result to a shared, multi-headed repo that has all of the\nfeatures people are working on.\n  When you think the code is in a good state, get the feature enabled for\nreview. At this point, you’ll need to get a full-feature reviewer selected.\nIt’s this person’s job to read every change that goes into this feature.\n  The full feature reviewer then reads the diff from the base-revision to\nthe tip, adding comments, requesting fixes, and reading diffs forward until\nthey’re happy, at which point, it’s seconded.\n  Once it’s seconded, the feature is enabled for review by anyone who is\nsigned up for review for the specific files you touched. How many file\nreviewers you are depends on the nature of the project. In our most\nsafety-critical systems, every file has three reviewers. In some other\nsystems, there are no file reviewers at all.\n\n\nThe remaining work needs to be done by the release manager. A release manager\ncan create a new release based on a set of features that:\n\n\n  are fully reviewed, and have no outstanding reviewer complaints to be\nresolved.\n  compile cleanly on their own and pass their automated tests\n  have as their base revision the previous release\n  can be merged together cleanly\n\n\nChecking that things “can be merged together cleanly” is actually tricky, since\nyou can’t just trust hg’s notion of a merge. cr has its own merge logic that\nis more conservative than what hg and git do. The biggest worry with hg is\nthat it tries to guess at a reasonable base-point for the 3-way merge (usually\nthe greatest common ancestor of the two heads to be merged). Usually this works\nwell, but it’s easy to construct crazy cases where on one branch you make\nchanges that are just silently dropped in the merge. There is also some rather\nsurprising behavior that can come into play when files are moved, copied or\ndeleted as part of the mix.\n\ncr, on the other hand, will always choose the base-point of the features to be\nmerged as the base-point for the 3-way merge. This way, the diffs that are\nreviewed are also the diffs that are used for constructing the merged node.\nAlso, cr has some extra sanity conditions on what merges it’s willing to try.\nThis all greatly reduces the scope for surprising results popping out the other\nside of a merge.\n\nIf the base-revision of a given feature is against an older release, then you\nneed to rebase the review before it can be released, i.e., update the\nbase-revision to the most recent release. Among other things requires you to\nmerge your changes with tip. If there are conflicts, you then either need to\nreview the resolution of the conflicts, or you simply need to reread the diffs\nfrom scratch. The last bit is pretty rare, but it’s an important escape hatch.\n\nHow’d it go?\n\nThe new code-review process has had a dramatic effect on our world. The review\nprocess for our main repository used to take anywhere from a couple of weeks to\na three months to complete a cycle. Today, those releases go out every week,\nlike clockwork. Everyone knows that if they can get their feature cleaned up and\nready, they can always get it out that week. Indeed, if you’re following our\nopen-source releases on github, you’ll see that new packages have shown up once\na week for the last 16 weeks.\n\nFeature-baaed review has led to a significant increase in the rate of change of\nour critical-path systems. Code review is now considerably less painful, and\nmost importantly, it’s easier than ever to say no to a feature that isn’t ready\nto go. In old-style batch review, there was a lot of pressure to not hold up a\nrelease polishing some small bit, which sometimes lead you to release code that\nwasn’t really ready. Now, that problem has largely vanished.\n\nThe barrier to entry for people who want to contribute to critical path systems\nhas also been lowered. This has also contributed to us being able to get\nprojects out the door faster.\n\nBut the most striking result I’ve seen is from our post-trade group, which\noperates outside of the review process used for the critical-path systems. The\npost-trade team is responsible for our infrastructure that handles everything\nafter a transaction is done, like tracking the clearing and settlement of our\ntrades or managing our trading books and records.\n\nPost-trade has historically had a more relaxed approach to code review – they\ndo it, but not on all parts of the system, and not in a particularly strict way.\nIn the last few months, however, they switched over to using the new\nfeature-based workflow, and even though they’re doing a lot more code review\n(which takes serious time), their overall process has become faster and more\nefficient. We think that’s largely do to having a well-managed workflow for\nmanaging and merging independent features, even without whatever the benefits of\nreview itself.\n\nI’m now pushing to get feature-based review adopted throughout the firm.\nObviously, not all code needs to be scrutinized to the same level – having\nthree reviewers for every file is sometimes sensible, sometimes overkill – but\nensuring that no change can get in unless one other person reads it and thinks\nit’s reasonable is a good baseline rule. Review has a lot of benefits: it\nimproves the quality of the code, gives you a great opportunity for training,\nand helps spread knowledge. Those benefits make sense everywhere we have people\nprogramming.\n\nMaybe the biggest lesson in this for me is the importance of thinking through\nyour processes, focusing on the biggest bottlenecks, and doing what you can to\nfix them.\n",
        "url"      : "https://blog.janestreet.com/patch-review-vs-diff-review-revisited/",
        "image"    : null,
        "topic"    :  ["technology","code-review","hg","ocaml"] ,
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Hackerschool tutorial at Jane Street",
        "date"     : "April 2, 2013",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : ["async","core","hackerschool","ocaml"],
        "minsToRead" : 1,
        "content"  : "\n\nWe just had a really fun tutorial on OCaml, Core and Async for about 20 people\nfrom the current Hacker School batch. The\ntutorial was in part based on an early cut of Real World\nOCaml, and in part based on the examples in the\nCore Hello World repo that has\na few examples of Core and Async in action.\n\nA few weeks ago, we gave out a small, early snapshot of the first few chapters\nof RWO, and some instructions for installing OPAM so\nyou could get the appropriate versions of OCaml, Core and Async installed, in\nthe hopes of getting people prepared for the tutorial. A few people also started\nworking on the 99 problems in\nOCaml from the\nocaml.org website.\n\n\n\nThe tutorial itself lasted about 4 1/2 hours, and we covered a lot of territory\nin a short period of time. I’m sure everyone didn’t fully understand everything,\nbut enough good questions were asked that I think people were getting the basic\nideas.\n\nIn any case, it was a blast from our perspective, and I hope we can do it again.\n\n“…you can write really pretty OCaml code…”\n\nI got a bunch of feedback from the Hacker School folk, including the following\nfrom Alan O’Donnel, one of the staff members.\n\n\n  I think the most eye-opening part of the seminar was seeing Command; it made\nme realize two things: one, you can write really pretty OCaml code, and two,\nyou guys write really pretty OCaml code! My (totally unfounded) image of OCaml\n(the language plus ecosystem) was something like a mix of Haskell and\nErlang–some nice type stuff, but kind of ugly looking and with kind of yucky\nlibraries. I was genuinely surprised that the Core and Async code we looked at\nwas so pretty; it feels really well-designed and elegant, and it makes me want\nto write OCaml code.\n\n\nMaybe we need to write a blog-post on how to use Command…\n",
        "url"      : "https://blog.janestreet.com/hackerschool-tutorial-at-jane-street/",
        "image"    : null,
        "topic"    :  ["technology","async","core","hackerschool","ocaml"] ,
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Maps, sets, and hashtables in core",
        "date"     : "November 12, 2012",
        "authorId" : "dhouse",
        "author"   : "David House",
        "tags"     : [],
        "minsToRead" : 4,
        "content"  : "The below post is mostly lifted from an email I sent to the Core mailing\nlist. It explains the type\nhackery we employ in order to get a really nice interface for maps, sets,\nhashtables and other such containers. It may look complex, but actually it works\nout remarkably simple to use, and ends up giving you a good number of static\nguarantees. I’ll use the example of hashtables, but the language readily\ntranslates into sets / maps.\n\nThere are two types of hashtables in core. Ones that use polymorphic comparison,\nand ones that use a specific comparision function that is hopefully more\nefficient and has non-surprising semantics (we basically think polymorphic\ncomparison, despite its convenience, is too surprising to be an overall good\nthing).\n\nThe type of hashtables using polymorphic comparison is\n('key, 'value) Hashtbl.Poly.t. The type of hashtables using, e.g., int\ncomparison for the keys is ‘value Int.Table.t. Given the previous paragraph, you\nshould always try to use Foo.Table when you can.\n\nWhen you create a hashtable (e.g. using create, of_alist, or t_of_sexp),\nyou must use the specific module name. i.e.\nlet table = Int.Table.create () in. However, when you already have a hashtable\nin your hands, and you want to use accessor functions, you should just use\nHashtbl.foo, regardless of what comparison function it uses.\n\nTo translate into Maps and Sets:\n\n'value Foo.Table.t  ('key,'value) Hashtbl.Poly.t  Hashtbl.foo\n'value Foo.Map.t    ('key,'value) Map.Poly.t      Map.foo\nFoo.Set.t           'element Set.Poly.t           Set.foo\n\n\n\n\n\nIf you have your own type and want to make Table, Map and Set submodules, it’s\nreally easy:\n\nmodule T = struct\n  type t = ... with compare, sexp\n  let hash = (* your hash function, maybe Hashtbl.hash *)\nend\ninclude Comparable.Make(T)\ninclude Hashable.Make(T)\n\n\n\nSaying “with compare” generates you an efficient comparison function specialised\nto your type. (Note that all component types need to have comparison functions\ndefined too, whether through “with compare” or through primitives.) The\nComparable.Make functor adds in modules to make you satisfy the Comparable.S\nsignature (basically the Set and Map modules, and a few more). The Hashable.Make\nfunctor adds in modules to make you satisfy Hashable.S (basically Hashtbl, as\nwell as some others like Hash_set). If you don’t want the Hashable stuff, there\nis no need to define a hash function. (Although Hashtbl.hash is normally not a\nbad choice.)\n\n\n\nHere’s how this all works under the hood:\n\nThe type of maps is “really” ('key, 'value, 'comparator) Map.t. Maps contain\nin their values the function that is used for comparing keys, i.e. a function\nof type 'key -&gt; 'key -&gt; int. But what is this “comparator” thing?\n\nWe can first motivate things by saying: it’s a pain to have to type Int.Map.find\nfor int-maps, String.Map.find for string-maps, etc. etc. It’d be nice to have a\nsingle type and use Map.find for everything. But this presents a problem because\nof functions like Map.merge, which takes two maps and combines them. You need to\nknow that the comparison functions are identical, but how can you do this?\n\nSo we have this extra comparator phantom type. Nothing in the actual\nrepresentation has a type involving ‘comparator: it’s just for static checking.\nIf you want to have a new comparison function, you must mint a new comparator\ntype. (Including the Comparable signature does this for you.)\n\nI originally wrote this last section with hashtables in mind, but due to a\nwrinkle, hashtables work a little differently. Maps and sets are fully\ncomparator-ified, but we’re yet to completely cut over hashtables. As a result,\nthe following code types but gives a runtime error:\n\nHashtbl.merge (Int.Table.create ()) (Hashtbl.Poly.create ())\n\n\n\n(The situation is not fully terrible: the above example only works if one side\nis Hashtbl.Poly.) We’re working on fixing this inconsistency – expect to see it\nin a version of core soon.\n",
        "url"      : "https://blog.janestreet.com/maps-sets-and-hashtables-in-core/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Bootstrapping OCaml/async on the Raspberry Pi",
        "date"     : "October 22, 2012",
        "authorId" : "mbacarella",
        "author"   : "Michael Bacarella",
        "tags"     : [],
        "minsToRead" : 3,
        "content"  : "On Friday morning I discovered a Raspberry Pi on my desk. The Raspberry Pi is a\nsmall single-board computer powered by a 700MHz ARM CPU. You can pick them up\nfor as little as $25 and boot ready-made full-featured Linux images off of SD.\n Out of the box, however, the Raspberry Pi includes no cables,\ninstructions, or boot media. Fortunately the office community quickly came\ntogether and delivered an HDMI-to-DVI cable, an SD card, and a mini-USB charger.\nReady to roll!\n\nWe followed the Quick Start\nGuide, opting for the Soft-float\nDebian “wheezy” image to play it safe.\n\nPower on! Lacking any real BIOS, the Raspberry Pi booted to Linux almost\ninstantly and after choosing some yes/nos from an ncurses setup menu (the only\nway to GUI) it rendered the standard soothing Linux gray-on-black console with\nbash prompt, cursor blinking expectantly. It even went ahead and requested an IP\naddress from our DHCP server once I connected ethernet.\n\n\n\nCool.\n\nSince this is Jane Street it wasn’t long before someone popped the question:\n\nCan the Raspberry Pi run OCaml?\n\nCan it build and run Async!?\n\nIt sure can! It’s a snap with\nopam:\n\nsudo apt-get install ocaml git m4\ngit clone https://github.com/OCamlPro/opam.git\ncd opam\n./configure && make install\nopam init\n\n\n\nUpdate your shell environment as per opam init’s instructions. I threw this into\nmy ~/.profile and re-logged in:\n\nwhich opam && eval $(opam config --env)\n\n\n\nNext I used opam to install async and all of its dependencies: opam install\nasync.\n\nIf you have trouble building opam from the github tip try updating to the\nrevision we used, cdc6decbf. The entire process takes about 2-3 hours as opam\nbuilds everything from source.\n\nHere’s how we pulled together a simple async echo server:\n\nmkdir echo_server\ncd echo_server\ncat &gt; hello_async.ml\n\n\n\nopen Core.Std\nopen Async.Std\n\nlet port = 10007\n\nlet handler _addr reader writer =\n  let rec echo_loop () =\n    Reader.read_line reader\n    &gt;&gt;= function\n      | Eof -&gt; return ()\n      | Ok line -&gt;\n        let s = line ^ \"\\n\" in\n        Writer.write writer s;\n        echo_loop ()\n      in\n      echo_loop ()\n\nlet () =\n  let d =\n    Tcp.Server.create (Tcp.on_port port) ~on_handler_error:`Ignore handler\n    &gt;&gt;| fun _server -&gt;\n    printf \"Echo server started on port %d\\n%!\" port\n  in\n  don't_wait_for d;\n  never_returns (Scheduler.go ())\n\n\n\ncat &gt; Makefile\n\nOCAMLMAKEFILE = OCamlMakefile\n\nRESULT = hello_async\n\nSOURCES = hello_async.ml\nPACKS = async\nTHREADS = true\nANNOTATE = true\n\n-include $(OCAMLMAKEFILE)\n\n\n\nFor simple projects OCamlMakefile is hard to beat. Grab it here\n\nwget https://bitbucket.org/mmottl/ocaml-makefile/raw/a50165b23fb73c66716777d778c4c84ec2aa7183/OCamlMakefile\n\n\n\nYou should now have a file called OCamlMakefile. make ./hello_async &\n\nNow for the moment of truth, let’s connect to the echo server with netcat.\n\nnc localhost 10007\nhello async!\nhello async!\n\n\n\nAwesome!\n",
        "url"      : "https://blog.janestreet.com/bootstrapping-ocamlasync-on-the-raspberry-pi/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Announcing OCaml Labs",
        "date"     : "October 19, 2012",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 2,
        "content"  : "Jane Street is deeply invested in OCaml and in the larger OCaml community, which\nwe depend on both for the new recruits it provides as well as for the libraries\nand tools that come out of it. And so, we’ve been looking for ways to encourage\nOCaml’s growth and help it thrive.\n\nOCaml Labs (or OCL, for short) is\nthe latest step in that effort. OCaml Labs is focused on pushing OCaml forward\nas a platform, making it better for users of all stripes, from industrial users\nlike us, to the growing set of schools that use OCaml as a teaching tool, to\nresearchers who want a principled and pragmatic language to base their research\non.\n\nOCL is housed at the Cambridge Computing Laboratory at Cambridge University, and\nAnil Madhavapeddy is the technical lead on the project. The first job for OCaml\nLabs is going to be to create an OCaml Platform, a stable, conveniently packaged\nversion of the most important pieces of software that you need in order to\nprogram in OCaml.\n\n(It’s worth noting that a big chunk of the work for the OCaml Platform is being\ndone by OCamlPro, in particular, their work on the\nnew OPAM package manager. If you haven’t used OPAM\nyet, you should try it. It still needs some work, but I believe it’s the best\nway to install OCaml packages bar none.)\n\nBut the platform is just the beginning. OCL will be involved in all aspects of\nthe OCaml toolchain, from development tools to core compiler optimizations, and\neverything in between.\n\nOCL doesn’t work in a vacuum, of course. Part of the Labs’ mission is to\nsimplify things for the many established contributors to OCaml, from INRIA\nitself, which remains the heart and home of the language, to established\ncommercial users like Jane Street, Citrix and LexiFi, to commercial services\ncompanies like OCamlPro, to independent users and contributors.\n\n2013 is looking to be an exciting year for OCaml. If you’re an OCaml hacker who\nwants to join the effort, you should consider applying for a\njob at OCaml\nLabs.\n\n(And, of course, if you want a job where you get to apply OCaml to real world\nproblems, you should consider applying to Jane\nStreet.)\n\n(Addendum, Anil has a\npost on this as\nwell…)\n",
        "url"      : "https://blog.janestreet.com/announcing-ocaml-labs/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Jane Street at OUD",
        "date"     : "September 27, 2012",
        "authorId" : "dhouse",
        "author"   : "David House",
        "tags"     : [],
        "minsToRead" : 0,
        "content"  : "Jane Street gave a collection of talks at the collection of conferences\ncolocated with this year’s ICFP. For me, the most exciting day was the OCaml\nUsers and Developers meeting – it’s always awesome to hear people’s experiences\nwriting cool software in OCaml, and there was good news for the community in the\nform of opam and OCamlLabs, a new lab housed\nwith the Cambridge University Computer Lab, of which I’m sure there will be more\nnews soon.\n\nThere were three presentations by Jane Street people at OUD: I gave an\nintroduction to async,\nMark gave some from-the-trenches tips on how to debug OCaml programs, and Ron\nspoke on the Core suite\nmore generally.\n\nThe videos and slides are online at oud.ocaml.org. The\nbest thing is that the talks are only a convenient twenty minutes long each!\n\n\n",
        "url"      : "https://blog.janestreet.com/jane-street-at-oud/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Observer effect and YOU (NSLCD and GetPWNam()-a-plenty)",
        "date"     : "September 18, 2012",
        "authorId" : "pmay",
        "author"   : "Pavel May",
        "tags"     : [],
        "minsToRead" : 3,
        "content"  : "So, whilst rooting around on one of our servers’s /var/log/*, I noticed that\nnslcd was attempting to look up, as usernames, apparently just regular strings\nappearing in previous lines of a log file. This behavior isn’t limited to nslcd\ninteracting with a specific daemon.\n\nTo the GOOGLES, then! At least one other person On The Internets has experienced\nthis, but no solutions were present there:\nhttp://old.nabble.com/Bug-505926%3A-command-%22tail-syslog-%7C-ccze%22-triggers-errors-in-nslcd-td20530700.html\n\nWhat gives?  The logs look like this:\n\nAug 31 10:19:08 nyc-dhcp dhcp/commit: Answer:\nAug 31 10:19:08 nyc-dhcp dhcp/commit: ;; -&gt;&gt;HEADER&lt;&lt;- opcode: UPDATE, status: NOERROR, id:  58056\nAug 31 10:19:08 nyc-dhcp dhcp/commit: ;; flags: qr; ZONE: 1, PREREQ: 0, UPDATE: 2, ADDITIONAL: 1\nAug 31 10:19:08 nyc-dhcp dhcp/commit: ;; ZONE SECTION:\nAug 31 10:19:08 nyc-dhcp dhcp/commit: ;AD-DOMAIN.com.#011#011#011IN#011SOA\nAug 31 10:19:08 nyc-dhcp dhcp/commit:\nAug 31 10:19:08 nyc-dhcp dhcp/commit: ;; UPDATE SECTION:\nAug 31 10:19:08 nyc-dhcp dhcp/commit: HOST-NAME-153.AD-DOMAIN.com.#0110#011ANY#011A#011\nAug 31 10:19:08 nyc-dhcp dhcp/commit: HOST-NAME-153.AD-DOMAIN.com.#0111800#011IN#011A#0111.2.3.40\nAug 31 10:19:08 nyc-dhcp dhcp/commit:\nAug 31 10:19:08 nyc-dhcp dhcp/commit: ;; TSIG PSEUDOSECTION:\nAug 31 10:19:08 nyc-dhcp dhcp/commit: 3487017774.sig-HOST-AD-DC-000.AD-DOMAIN.com. 0 ANY TSIG gss-tsig. 1346422748 36000 37 TSIG_SIG_TSIG_SIG_TSIG_SIG_TSIG_SIG 58056 NOERROR 0\nAug 31 10:19:08 nyc-dhcp dhcp/commit:\nAug 31 10:19:08 nyc-dhcp dhcp/commit: Answer:\nAug 31 10:19:08 nyc-dhcp dhcp/commit: ;; -&gt;&gt;HEADER&lt;&lt;- opcode: UPDATE, status: NOERROR, id:  37071\nAug 31 10:19:08 nyc-dhcp dhcp/commit: ;; flags: qr; ZONE: 1, PREREQ: 0, UPDATE: 2, ADDITIONAL: 1\nAug 31 10:19:08 nyc-dhcp dhcp/commit: ;; ZONE SECTION:\nAug 31 10:19:08 nyc-dhcp dhcp/commit: ;3.2.1.in-addr.arpa.#011IN#011SOA\nAug 31 10:19:08 nyc-dhcp dhcp/commit:\nAug 31 10:19:08 nyc-dhcp dhcp/commit: ;; UPDATE SECTION:\nAug 31 10:19:08 nyc-dhcp dhcp/commit: 40.3.2.1.in-addr.arpa. 0#011ANY#011PTR#011\nAug 31 10:19:08 nyc-dhcp dhcp/commit: 40.3.2.1.in-addr.arpa. 1800 IN#011PTR#011HOST-NAME-153.AD-DOMAIN.com.\nAug 31 10:19:08 nyc-dhcp dhcp/commit:\nAug 31 10:19:08 nyc-dhcp dhcp/commit: ;; TSIG PSEUDOSECTION:\nAug 31 10:19:08 nyc-dhcp dhcp/commit: 2881735749.sig-HOST-AD-DC-000.AD-DOMAIN.com. 0 ANY TSIG gss-tsig. 1346422748 36000 37 TSIG_SIG_TSIG_SIG_TSIG_SIG_TSIG_SIG 37071 NOERROR 0\nAug 31 10:19:08 nyc-dhcp dhcp/commit:\nAug 31 10:19:08 nyc-dhcp nslcd[20167]: [a00487] nslcd_passwd_byname(;;): invalid user name\nAug 31 10:19:08 nyc-dhcp nslcd[20167]: [cb1b60] nslcd_passwd_byname(-&gt;&gt;header&lt;&lt;-): invalid user name\nAug 31 10:19:08 nyc-dhcp dhcpd: DHCPREQUEST for 1.2.3.40 (1.2.3.241) from 12:34:54:67:89:AB (HOST-NAME-MON1) via 1.2.3.2\nAug 31 10:19:08 nyc-dhcp dhcpd: DHCPACK on 1.2.3.40 to 12:34:54:67:89:AB (HOST-NAME-MON1) via 1.2.3.2\nAug 31 10:19:08 nyc-dhcp nslcd[20167]: [59648f] nslcd_passwd_byname(AD-DOMAIN.com.#011#011#011in#011soa): invalid user name\nAug 31 10:19:08 nyc-dhcp nslcd[20167]: [df331d] nslcd_passwd_byname(HOST-NAME-153.AD-DOMAIN.com.#0110#011any#011a#011): invalid user name\nAug 31 10:19:08 nyc-dhcp nslcd[20167]: [08a9e4] nslcd_passwd_byname(HOST-NAME-153.AD-DOMAIN.com.#0111800#011in#011a#0111.2.3.40): invalid user name\nAug 31 10:19:08 nyc-dhcp nslcd[20167]: [167394] nslcd_passwd_byname(TSIG_SIG_TSIG_SIG_TSIG_SIG_TSIG_SIG): invalid user name\nAug 31 10:19:08 nyc-dhcp nslcd[20167]: [21145c] nslcd_passwd_byname(3.2.1.in-addr.arpa.#011in#011soa): invalid user name\nAug 31 10:19:08 nyc-dhcp nslcd[20167]: [34dfe8] nslcd_passwd_byname(0#011any#011ptr#011): invalid user name\nAug 31 10:19:08 nyc-dhcp nslcd[20167]: [d22b79] nslcd_passwd_byname(in#011ptr#011HOST-NAME-153.AD-DOMAIN.com): invalid user name\nAug 31 10:19:08 nyc-dhcp nslcd[20167]: [37975e] nslcd_passwd_byname(TSIG_SIG_TSIG_SIG_TSIG_SIG_TSIG_SIG): invalid user name`\n\nOn the off-chance it was rsyslog doing massive amounts of GetPWName() style calls, tried with syslog-ng. Nope, same behavior. Something else is causing this insanity. Upgrading to nss-pam-ldapd 0.8.10 (latest from source, CentOS uses 0.7.15), the errors start looking like this, instead:\n\n`2012-08-31T13:50:54.587921-04:00 HOST-NAME-016 Synergy 1.4.9: INFO: switch from \"HOST-NAME-P2\" to \"HOST-NAME-016.AD-DOMAIN.com\" at 2081,5\n2012-08-31T13:50:54.588105-04:00 HOST-NAME-016 Synergy 1.4.9: INFO: entering screen\n2012-08-31T13:50:54.615333-04:00 HOST-NAME-016 nslcd[29418] [eebeea] &lt;passwd=\"2081,5\"&gt; request denied by validnames option\n2012-08-31T13:50:54.628272-04:00 HOST-NAME-016 nslcd[29418] [65eb66] &lt;passwd=\"passwd=\"2081,5\"&gt; request denied by validnames option\n2012-08-31T13:50:54.649388-04:00 HOST-NAME-016 nslcd[29418] [ae65de] &lt;passwd=\"passwd=\"passwd=\"2081,5\"&gt; request denied by validnames option\n2012-08-31T13:50:54.655346-04:00 HOST-NAME-016 nslcd[29418] [13d013] &lt;passwd=\"passwd=\"passwd=\"passwd=\"2081,5\"&gt; request denied by validnames option\n2012-08-31T13:50:54.661263-04:00 HOST-NAME-016 nslcd[29418] [e0986f] &lt;passwd=\"passwd=\"passwd=\"passwd=\"passwd=&gt; request denied by validnames option\n2012-08-31T13:50:54.667272-04:00 HOST-NAME-016 nslcd[29418] [4614f7] &lt;passwd=\"passwd=\"passwd=\"passwd=\"passwd=&gt; request denied by validnames option\n\n\n\nThe ‘validnames’ option is a new thing. Here’s a snippet of the man page on\nnslcd.conf about it:\n\nvalidnames REGEX\nThis option can be used to specify how user and group names are verified within the system. This pattern is used to check all user and group names that are requested and returned from LDAP.\nThe regular expression should be specified as a POSIX extended regular expression. The expression itself needs to be separated by slash (/) characters and the 'i' flag may be appended at the end to indicate that the match should be case-insensetive.\nThe default value is /^[a-z0-9.&lt;em&gt;@$][a-z0-9.&lt;/em&gt;@$ \\~-]*[a-z0-9._@$~-]$/i\n\n\n\nIn the end, it ended up being a tool with which I was looking at the syslogs –\nccze. CCZE is a colorizer. Part of the decision as to what colour things should\nhave is CCZE doing username (aka GetPWNam()) lookup, causing all these\nshenanigans.\n\nObserve, the observer effect at work.\n",
        "url"      : "https://blog.janestreet.com/observer-effect-and-you-nslcd-and-getpwnam-a-plenty/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Repos, RPMs, and bootstraps, oh my!",
        "date"     : "June 21, 2012",
        "authorId" : "pmay",
        "author"   : "Pavel May",
        "tags"     : [],
        "minsToRead" : 4,
        "content"  : "Dell’s OMSA (Open Manage Server Administrator) tools are quite nice, but the\n6.5.x (and prior) version’s installation process did not lend itself to being\nautomated and managed by Puppet. Now, with OMSA 7.0, things have changed, and\nmostly for the better. Especially for simplifying things from the puppet\nmanagement side, and there is even a possibility that the new system will work\nwell in our netboot environment.  The older-style repository mirror\nwould give us a heady mix a-la:\n\n/repos/pub/mirrors/linux.dell.com/OMSA_6.5.3\n$ ls\nbootstrap.cgi  per415                        system.ven_0x1028.dev_0x01e7\ndx6000         per510                        system.ven_0x1028.dev_0x01ea\ndx6000g        per515                        system.ven_0x1028.dev_0x01eb\ndx6012s        per610                        system.ven_0x1028.dev_0x01f0\nHEADER.shtml   per710                        system.ven_0x1028.dev_0x0205\nmirrors.cgi    per715                        system.ven_0x1028.dev_0x0208\npe0420         per805                        system.ven_0x1028.dev_0x020b\npe0430         per810                        system.ven_0x1028.dev_0x020c\npe0440         per815                        system.ven_0x1028.dev_0x020f\npe0800         per900                        system.ven_0x1028.dev_0x0210\npe0830         per905                        system.ven_0x1028.dev_0x0221\npe0840         per910                        system.ven_0x1028.dev_0x0223\npe0850         pet100                        system.ven_0x1028.dev_0x0225\npe0860         pet105                        system.ven_0x1028.dev_0x0235\npe1430         pet110                        system.ven_0x1028.dev_0x0236\npe1435         pet110ii                      system.ven_0x1028.dev_0x0237\npe1800         pet300                        system.ven_0x1028.dev_0x023c\npe1850         pet310                        system.ven_0x1028.dev_0x025c\npe1855         pet410                        system.ven_0x1028.dev_0x027b\npe1900         pet605                        system.ven_0x1028.dev_0x0287\npe1950         pet610                        system.ven_0x1028.dev_0x028b\npe1955         pet710                        system.ven_0x1028.dev_0x028c\npe2800         platform_independent          system.ven_0x1028.dev_0x028d\npe2850         repoview.html                 system.ven_0x1028.dev_0x029b\npe2900         RPM-GPG-KEY-dell              system.ven_0x1028.dev_0x029c\npe2950         RPM-GPG-KEY-libsmbios         system.ven_0x1028.dev_0x02a3\npe2970         system.ven_0x1028.dev_0x016c  system.ven_0x1028.dev_0x02a4\npe6800         system.ven_0x1028.dev_0x016d  system.ven_0x1028.dev_0x02a5\npe6850         system.ven_0x1028.dev_0x016e  system.ven_0x1028.dev_0x02a6\npe6950         system.ven_0x1028.dev_0x016f  system.ven_0x1028.dev_0x02d3\npem600         system.ven_0x1028.dev_0x0170  system.ven_0x1028.dev_0x02d4\npem605         system.ven_0x1028.dev_0x0180  system.ven_0x1028.dev_0x02dc\npem610         system.ven_0x1028.dev_0x0183  system.ven_0x1028.dev_0x02f1\npem610x        system.ven_0x1028.dev_0x0185  system.ven_0x1028.dev_0x0430\npem710         system.ven_0x1028.dev_0x018a  system.ven_0x1028.dev_0x043c\npem710hd       system.ven_0x1028.dev_0x01ae  system.ven_0x1028.dev_0x0444\npem805         system.ven_0x1028.dev_0x01b1  system.ven_0x1028.dev_0x0445\npem905         system.ven_0x1028.dev_0x01b2  system.ven_0x1028.dev_0x045d\npem910         system.ven_0x1028.dev_0x01b3  system.ven_0x1028.dev_0x045f\npem915         system.ven_0x1028.dev_0x01b6  system.ven_0x1028.dev_0x0488\nper200         system.ven_0x1028.dev_0x01b7  system.ven_0x1028.dev_0x0489\nper210         system.ven_0x1028.dev_0x01b8  system.ven_0x1028.dev_0x04d6\nper210ii       system.ven_0x1028.dev_0x01b9  system.ven_0x1028.dev_0x04dd\nper300         system.ven_0x1028.dev_0x01bb  system.ven_0x1028.dev_0x04de\nper310         system.ven_0x1028.dev_0x01df  system.ven_0x1028.dev_0x04fb\nper410         system.ven_0x1028.dev_0x01e6  _tools\n\n\n\nThe setup wasn’t without its flaws – for one thing, one could not simply point\nYum at this lot. Instead, the boostrap.cgi script was run (by way of wget -o\npiped to bash – what could wrong go with that?), which generated a repo\ndefinition file in /etc/yum.repos.d/ based on the system platform on which the\nboostrap was run, from which Yum could now install srvadmin, et cetera. Dell’s\ntake on the матрешка installation method?\n\nTechnically, the initial steps are: 1) wget -q -O –\nhttp://repo-server/OMSA/hardware/latest/boostrap.cgi | bash 2) yum clean all\n3) yum install srvadmin-all 4) /sbin/service dataeng start\n\nIn and of itself, this setup isn’t all that terrible. When it has to be\nautomated with Puppet (our system configuration management system), however,\nshenanigans ensue.\n\nSaid shenanigans multiply at a rate that’d make a flat cat turn rainbow with\nenvy when we try to manage rack-mounted netbooted Dell PowerEdge servers – the\ncustomized installation coded to a different hardware chassis means either\nseparate netboots for each hardware class (and then, separate from that, netboot\nenvironment for the desktop-class machines which aren’t supported by OMSA), or\ntrickery at boot-time. We chose later, but lack of happiness by all involved was\nnoticeable.\n\nEnter OMSA 7.0.The linux.dell.com site does not appear to have the new software.\nInstead, one gets a large tarball within which there lives a different set of\nfiles:\n\n/repos/pub/mirrors/linux.dell.com/OMSA_7.0$ ls -alF\ntotal 538\ndrwxr-xr-x 6 root root    197 Jun 20 14:48 ./\ndrwxr-xr-x 8 root root    334 Jun 20 11:35 ../\n-r-xr-xr-x 1 root root    899 Feb 15 08:39 COPYRIGHT.txt*\ndrwxrwxr-x 9 root root    144 Feb 15 08:39 docs/\n-r-xr-xr-x 1 root root   9975 Feb 15 08:39 license.txt*\ndrwxr-xr-x 5 root root    107 Feb 15 02:38 linux/\n-rwxrwxr-x 2 root root 111347 Feb 15 08:36 setup.sh*\n\n\n\nHaving to run setup.sh is not all that much convenient than the previous setup.\nThere is, however, one VERY large difference – there is now a common set of\nRPMs (split by OS) rather than by hardware, which allowed use to simply make a\nrepo out of them.\n\nIngredients: OMSA 7.0 (One tarball) createrepo (one RPM, installed)\n\nSteps: 1) Untar the OMSA 2) Create/switch to the OMSA 7.0 repo root directory\nThen, an incantation:\n\nmkdir -p 5/x86_64/RPMS\ncp linux/RPMS/supportRPMS/&lt;em&gt;/RHEL5/x86_64/&lt;/em&gt; 5/x86_64/RPMS\nmkdir -p 6/x86_64/RPMS\ncp linux/RPMS/supportRPMS/&lt;em&gt;/RHEL6/x86_64/&lt;/em&gt; 6/x86_64/RPMS\nfor rel in 5 6\ndo\n  cd $rel/x86_64\n  createrepo .\n  cd ..\ndone\n\n\n\nCraft yourself a yum repo file not unlike what’s listed below:\n\n[dell-omsa]\nname=Dell OMSA repository - 7.0\ntype=rpm-md\nbaseurl=http://repo-server/OMSA_7.0/$releasever/$basearch\ngpgcheck=0\nenabled=1\n\n\n\nWhen in doubt, yum clean all after any change to the /etc/yum.repos.d/\nfiles.\n\nNow, ‘yum install srvadmin-all’ works. The advantage is that we can have the\nrepo definition put into place rather simply, and not worry about running\nscripts and various other things surrounding this.\n\nThe one thing the old method gave us was the ability to gather firmware versions\nof different components, as well as apply firmware updates. The New and Improved\nWay ™ doesn’t seem to provide this functionality, though it may be just a matter\nof figuring out the new thing.\n",
        "url"      : "https://blog.janestreet.com/repos-rpms-and-bootstraps-oh-my/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Speed Up",
        "date"     : "June 5, 2012",
        "authorId" : "jkilburg",
        "author"   : "John Kilburg",
        "tags"     : [],
        "minsToRead" : 1,
        "content"  : "Sometimes we need to inspect a lot of systems at once and its annoying to have\nto wait a long time for a result. What to do?\n\nJane Street created a library for asynchronous programming called Async which\nyou can read about on the Ocaml\nCore site.\nWe can use that to speed things up a bit.  Say that we need to ping a\nbunch of systems to make sure that they’re accessible before performing some\nother checks. We can do something like:\n\nopen Core.Std\nopen Async.Std\n\nmodule Ashell = struct\n  module Shell = Core_extended.Shell\n  module Process = Shell.Process\n\nlet k_shell_command k f fmt =\n    ksprintf (fun command -&gt; k f (Process.shell command)) fmt\n\nlet sh_test ?true_v =\n    Process.test_k (k_shell_command (fun f cmd -&gt;\n      In_thread.run (fun () -&gt; f cmd))) ?true_v\nend\n\nlet ping_host = Ashell.sh_test \"/bin/ping -c 1 %s &gt;/dev/null 2&gt;&1 || /bin/false\"\n\nlet ping_all host_list =\n  Deferred.List.iter\n   host_list\n   ~how:`Parallel\n   ~f:(fun h -&gt; ping_host h\n\n    | fun p -&gt; printf \"%s%sn\" h (if p then \"\" else \" (ping failed)\"))\n\n      fun () -&gt; shutdown 0\n\nlet () =\n  ping_all [\"node1\"; \"node2\"; \"node3\"; \"node4\"];\n  never_returns (Scheduler.go())\n\n\n\nWhen Jane Street’s Async_extended library is released you’ll be able to ditch\nthe Ashell module above and use Shell.test in the program.\n\nIf it looks daunting then read the introduction linked above and you’ll be able\nto write faster code for systems administration tasks quickly and safely typed!\nAwesome!\n",
        "url"      : "https://blog.janestreet.com/speed-up/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Finding references to lazily unmounted filesystems",
        "date"     : "May 14, 2012",
        "authorId" : "pmay",
        "author"   : "Pavel May",
        "tags"     : [],
        "minsToRead" : 2,
        "content"  : "Just because a file system is lazily unmounted (or so claims the mount command\nand /proc/mounts), doesn’t mean it is no longer in use. Here’s how find all the\nfile handle holders on the now mostly-invisible file system: We recently\nmigrated a heavily-used read-only network filesystem from one NAS to another. On\nmachines with references to the filesystem, attempting to unmount the old\nfilesystem gives the familiar error:\n\numount: /share: device is busy\n\nTo make the transition seamless, we lazily unmounted the old filesystem and\nmounted the new filesystem in its place. This way, new processes will use the\nnew filesystem and old references will go away as processes die.\n\nHowever, doing a lazy unmount complicates finding the references to the old\nfilesystem. Turning off the old NAS could cause processes still referencing it\nto hang unpredictably at some point in the future.\n\nFinding references to lazily unmounted filesystems is complicated because the\nfull path no longer appears in, say, lsof. We noticed the path was modified in a\npeculiar way: the filesystem prefix was truncated. While we might normally see\nthis:\n\n$ sudo lsof -nn\nCOMMAND     PID        USER   FD      TYPE             DEVICE      SIZE                 NODE NAME\ninit          1        root  cwd       DIR              253,0      4096                    2 /\ninit          1        root  rtd       DIR              253,0      4096                    2 /\n[...]\ndaemonxyz 18473        root  txt       REG               0,22  13820130               266929 /share/bin/daemon_xyz (nas:/share)\n\n\n\nWe instead saw:\n\ndaemonxyz 18473        root  txt       REG               0,22  13820130               266929 bin/daemon_xyz\n\n\n\nlsof doesn’t appear to list non-absolute paths otherwise, so we use that to find\nreferences. Unfortuntately, lsof’s output is irregular (columns can be blank,\nand don’t have a fixed number of columns between different runs). Worse still,\nlsof doesn’t list other filesystem dependencies like memory maps. Files can\nserve as the backing-store of a memory mapping, which is how Linux implements\non-demand paging of program text and shared libraries. So, our script ended up\nlooking at /proc/*/maps, removing known false positives:\n\n!/bin/bash\n\ncat /proc/*/maps \n  | awk '{print $6}'\n  | grep -v '^/'         # remove absolute paths\n  | grep -v '^$' \n  | grep -v '(deleted)' \n  | grep -v '^.vdso.$' \n  | grep -v '^.heap.$' \n  | grep -v '^.stack.$' \n  | grep -v '^.vsyscall.$' \n  | grep -v '^socket:$'\n\n\n\nThe maps file presents the memory mappings which belong to each process,\nindicating the kind of mapping and, if it’s a file, its path. Like lsof, the\nabsolute path is unavailable if the filesystem hosting the file was lazy\nunmounted. This left us reasonably confident that we’d gotten all old\nreferences, though we’d still like a better way.\n",
        "url"      : "https://blog.janestreet.com/finding-references-to-lazily-unmounted-filesystems/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "OCaml for the Sysadmins",
        "date"     : "April 23, 2012",
        "authorId" : "jkilburg",
        "author"   : "John Kilburg",
        "tags"     : [],
        "minsToRead" : 2,
        "content"  : "I sent a link to Yaron Minsky’s article OCaml for the\nMasses to some long time buddies of\nmine and one of them said that there aren’t enough OCaml systems administration\nprogramming examples.\n\nHe sent me some Python code that produces a string given a hash found in Cisco\nIOS configurations. I wrote an OCaml version along with a function to produce a\nhash from a string.\n\nThis uses Jane Street’s\nCore library which has a\nlot of nifty features so be sure to download and install that.\n\nHere you go Robert:\n\n!/your-path-here/ocaml\n\nopen Core.Std\n\nlet xlat = [| 0x64; 0x73; 0x66; 0x64; 0x3b; 0x6b; 0x66; 0x6f;\n             0x41; 0x2c; 0x2e; 0x69; 0x79; 0x65; 0x77; 0x72;\n             0x6b; 0x6c; 0x64; 0x4a; 0x4b; 0x44; 0x48; 0x53;\n             0x55; 0x42; 0x73; 0x67; 0x76; 0x63; 0x61; 0x36;\n             0x39; 0x38; 0x33; 0x34; 0x6e; 0x63; 0x78; 0x76;\n             0x39; 0x38; 0x37; 0x33; 0x32; 0x35; 0x34; 0x6b;\n             0x3b; 0x66; 0x67; 0x38; 0x37 |]\n\nlet get_pairs hash =\n  List.rev\n    (Array.fold ~init:[]\n      ~f:(fun acc x -&gt; match x with |[|x|]-&gt;x::acc |_-&gt;failwith \"Bad hash\")\n     (Pcre.extract_all ~full_match:false ~pat:\"(..)\" hash))\n\nlet hash7 hash =\n  let xlater s c =\n    Char.to_string (Char.of_int_exn ((Int.of_string (\"0x\"^c) lxor xlat.(s))))\n  in\n  match get_pairs hash with\n  | salt::rest -&gt;\n      let (result,_) =\n        List.fold\n          ~init:(\"\",Int.of_string(salt))\n          ~f:(fun (result,s) c -&gt; (result ^ (xlater s c), s + 1))\n          rest\n      in\n      result\n  | _ -&gt; failwith \"Mangled hash\"\n\nlet () =\n  printf \"%sn\" (hash7 \"03365409031D350C4B080D165705041E093965\")\n\n\n\nHere is the python version:\n\n!/usr/bin/python\n\nimport re\n\nDecode bogus Cisco hash\n\ndef hash7(hash):\n  xlat = [0x64, 0x73, 0x66, 0x64, 0x3b, 0x6b, 0x66, 0x6f,\n           0x41, 0x2c, 0x2e, 0x69, 0x79, 0x65, 0x77, 0x72,\n           0x6b, 0x6c, 0x64, 0x4a, 0x4b, 0x44, 0x48, 0x53,\n           0x55, 0x42, 0x73, 0x67, 0x76, 0x63, 0x61, 0x36,\n           0x39, 0x38, 0x33, 0x34, 0x6e, 0x63, 0x78, 0x76,\n           0x39, 0x38, 0x37, 0x33, 0x32, 0x35, 0x34, 0x6b,\n           0x3b, 0x66, 0x67, 0x38, 0x37 ];\n  if (len(hash)&1 == 1):\n    return None # Odd length of hash\n  m = re.search('^(..)(.+)', hash)\n  if m:\n    (s, e) = (int(m.group(1)), m.group(2))\n  else:\n    return None\n  result = ''\n  for c in re.findall('[da-fA-F]{2}', e):\n    result = result + (\"%c\" % (int(c, 16) ^ xlat[s]))\n    s = s + 1\n  return result\n\nhash = a = raw_input('Enter Hash: ')\nprint \"%s\" % hash7(hash)\n\n\n",
        "url"      : "https://blog.janestreet.com/ocaml-for-the-sysadmins/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "S-Expressions in ruby",
        "date"     : "March 29, 2012",
        "authorId" : "rdouglass",
        "author"   : "Ralph Douglass",
        "tags"     : [],
        "minsToRead" : 1,
        "content"  : "We use a lot of S-Expressions at Jane Street. Almost every system written at\nJane Street in OCaml uses sexps for config files, and we use it for a lot of IPC\nwhen resources aren’t an issue.\n\nThat’s all fine and good when you just have OCaml talking to OCaml, but we are a\npractical bunch, and sometimes we want to use a library or open source project\nthat is written for a different language.\n\nLuckily, if the language is Ruby, you don’t need to work very hard, due to an\nexcellent ruby gem called SXP.  Note: you’ll need ruby 1.9 for this\n\n$ sudo gem install rdf sxp\n\n\n\nAnd then you can write something like this:\n\n#!/your/path/to/ruby\n\nrequire 'sxp'\n\nsxp = SXP.read \"(foo bar baz biff)\" # =&gt; [:foo, :bar, :baz, :biff]\n\n# ex.sexp:\n# ((foo 1)\n# (bar 2))\n# ((foo 3)\n# (bar 4))\n# ((foo Hello)\n# (bar snoo))\n\nsxpfile = SXP.read(\"ex.sexp\")\n# =&gt; [[[:foo, 1], [:bar, 2]], [[:foo, 3], [:bar, 4]], [[:foo, :Hello], [:bar, :snoo]]]\n\n\n\nIt’s just that simple.\n\nYou can read more about sxp at http://rubygems.org/gems/sxp\n",
        "url"      : "https://blog.janestreet.com/s-expressions-in-ruby/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Linux disk crypto, now with ease",
        "date"     : "March 28, 2012",
        "authorId" : "pmay",
        "author"   : "Pavel May",
        "tags"     : [],
        "minsToRead" : 3,
        "content"  : "Imagine that you want to host a Linux machine outside of your own physical\ncontrol. Imagine that it can get resold later to people who may be quite curious\nabout the data on said machine’s HDDs. What do you do? What DO YOU DO?!\n There are many options. One of them is to use the LUKS (Linux\nUnified Key Setup) encryption and just encrypt the whole disk (short of /boot).\n\nOne of LUKS’ advantages is that it is built into the CentOS/Kickstart (even 5.x)\nAnaconda. Enabled via --encrypted --passphrase=&lt;SOMEPHRASE&gt; set of options in\nthe kickstart.cfg file as a part of the “part” directive. Looks like this, in\ncontext:\n\nbootloader --location=mbr --driveorder=sda,sdb --append=\"ide=disable clocksource=hpet processor.max_cstate=1 time\"\nclearpart --all --initlabel --drives=sda\npart /boot --size=200 --asprimary --ondisk=sda\npart pv.1 --size=40000 --grow --ondisk=sda --encrypted --passphrase=test01\nvolgroup vg01 pv.1\nlogvol / --vgname=vg01 --name=root --size=20000\n\n\n\nNote: /boot isn’t encrypted. Do we care? (typically, this is a PITA)\n\nI’ve tested this and the OS just installs. The prompt to enter the key happens\nat initrd level when the system reboots, before root is pivoted from initrd to\nthe real volume group.\n\nAn additional concern is what happens if there is power loss – encrypted file\nsystems (on, in this case, the actual PV which underpins the LVM) do not take\nkindly to that sort of a thing and recovery may get Ugly™.\n\nRebooting the machine without interacting with it via a DRAC or some other\nconsole emulation will not work.\n\nAfter the install, we obviously want to change the passphrase, what with\n‘test01′ being somewhat weak and all.\n\nHere’s how to do it (assuming that /dev/sda2 is the encrypted PV)\n\n\n  Boot from a rescue media.\n  Run cryptosetup luksAddKey /dev/sda2.\n  At the prompt “Enter any LUKS passphrase:“, enter the current passphrase\n(using the kickstart snippet above, ‘test01′).\n  When prompted, enter the new passphrase. Re-enter to verify it.\n  Make note of the output – it’ll indicate which LUKS crypto slot was used.\nIt’ll be the argument to cryptosetup luksKillSlot /dev/sda2 &lt;SLOT#&gt;.\n  When prompted “Enter any remaining LUKS passphrase:“, enter the new\npassphrase.\n\n\nVoila.\n\nNote: LUKS, like almost everything else, is susceptible\nto the “cold boot” attack. We\nmay need to look at doing TPM with PINs to mitigate some of that, but it doesn’t\nhelp much with the “I have now cooled and removed your RAM. Mu ha ha” case.\n\nOne of the ideas considered was to do a Really Clever Thing ™: The DRAC (Dell\nRemote Access Controller) has the ability to become a “CD” to the system. One\ncould imagine a setup in which, in order to boot, the sysadmin uploads a LUKS\nkey to the storage medium, the system boots, and in instead of the passphrase,\nuses a file on the virtual CD as an unlock key. Once the machine is up, the\nstorage can be wiped, though wiping FLASH-based storage gets particularly\ntricky.\n\nIt is worth noting that, should it be desirable, it is possible to protect a\nsingle LVOL rather the whole PV, which makes things like changing pass phrases\nand system boot-time troubleshooting a lot easier.\n\nWe looked at the other common piece of software – TrueCrypt, and for full-disk\nencryption, coupled with ease of setup, it does not compete. The anti-forensic\nfeatures of it, while impressive, aren’t of interest for us in solving this\nparticular problem, LUKS it is.\n\nStrongly suggested is the following additional reading:\nhttp://code.google.com/p/cryptsetup/wiki/FrequentlyAskedQuestions\n",
        "url"      : "https://blog.janestreet.com/linux-disk-crypto-now-with-ease/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "ML workshop",
        "date"     : "March 22, 2012",
        "authorId" : "sweeks",
        "author"   : "Stephen Weeks",
        "tags"     : [],
        "minsToRead" : 0,
        "content"  : "The upcoming ML workshop is September 13th in\nCopenhagen. The goal is to bring together users and researchers, with users\nparticularly welcome to propose a presentation. The workshop is informal, with\nsubmissions limited to two pages and no proceedings.\n\nPlease consider submitting a presentation and attending! The ML workshop\nimmediately follows\n\nICFP, overlaps with CUFP, and is\nimmediately followed by the OCaml Users and Developers\nWorkshop. So there’s plenty of opportunities to see neat\nthings and talk to interesting people in the FP world.\n",
        "url"      : "https://blog.janestreet.com/ml-workshop/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Do you solve real problems with FP? Then come talk about it at CUFP!",
        "date"     : "February 21, 2012",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 0,
        "content"  : "The 2012 CUFP call for\npresentations is out. CUFP is the\nCommerical Users of Functional Programming workshop, and it’s a great place to\ninteract with a community of people who use functional programming for solving\nreal world problems. The workshop is colocated with ICFP, which will be in\nDenmark this year.\n\nThere’s actually a lot of interesting stuff going on at the tail end of ICFP.\nThere’s going to be an OCaml Users and Developers Workshop the day before the\nCUFP talks, which I’m very much looking forward to. Plus, CUFP organizes a set\nof FP tutorials which I’ve found to be really productive. And there are some\ninteresting-looking new workshops, like a workshop on functional\nhigh-performance computing.\n",
        "url"      : "https://blog.janestreet.com/do-you-solve-real-problems-with-fp-then-come-talk-about-it-at-cufp/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "The adventures in NIC bonding",
        "date"     : "February 17, 2012",
        "authorId" : "pmay",
        "author"   : "Pavel May",
        "tags"     : [],
        "minsToRead" : 1,
        "content"  : "So. Had an interesting problem happen. On a CentOS 5.5 system\n(2.6.18-194.26.1.el5), an attempt was made to establish a 3-way LACP-style (aka\n802.3ad) bonded NIC interface (bond1). The configuration parameters set to what\nappear to be proper things. And yet, things weren’t working (wrong etherchannel\nmode is being picked).  Relevant configuration(s):\n\n$ cat /etc/modprobe.conf\nalias eth0 igb\nalias eth1 igb\nalias eth2 igb\nalias eth3 igb\nalias scsi_hostadapter megaraid_sas\nalias bond1 bonding\noptions bond1 mode=802.3ad miimon=100\nalias bond0 bonding\noptions bond0 miimon=100 mode=1 max_bonds=2\noptions ib_ipoib send_queue_size=4096 recv_queue_size=4096\nalias ib1 ib_ipoib\nalias ib0 ib_ipoib\n\n$ cat /etc/sysconfig/network-scripts/ifcfg-bond1\nDEVICE=bond1\nIPADDR=XXX.XXX.XXX.XXX\nNETMASK=255.255.252.0\nONBOOT=yes\nBOOTPROTO=none\nUSERCTL=no\n\n$ cat /etc/sysconfig/network-scripts/ifcfg-eth[1-3]\n\nIntel Corporation 82576 Gigabit Network Connection\n\nDEVICE=eth1\nHWADDR=12:34:56:78:9A:BC\nUSERCTL=no\nONBOOT=yes\nMASTER=bond1\nSLAVE=yes\nBOOTPROTO=none\n\nIntel Corporation 82576 Gigabit Network Connection\n\nDEVICE=eth2\nHWADDR=12:34:56:78:9A:BC\nUSERCTL=no\nONBOOT=yes\nMASTER=bond1\nSLAVE=yes\nBOOTPROTO=none\n\nIntel Corporation 82576 Gigabit Network Connection\n\nDEVICE=eth3\nHWADDR=12:34:56:78:9A:BC\nUSERCTL=no\nONBOOT=yes\nMASTER=bond1\nSLAVE=yes\nBOOTPROTO=none\n\n\n\nNo errors of note were being logged in /var/log/messages, but on the switch, we\nwere seeing this:\n\nsh etherchannel summary | include Po57\n57     Po57(SD)        LACP      Gi9/24(I)      Gi9/25(I)      Gi9/26(I)\n\n\n\nThis indicates that the port channel is down and the constituent interfaces are\nrunning in independent mode. Quite confusing.\n\nWhat’s more, on the system itself:\n\n$ cat /sys/class/net/bond1/bonding/mode\nactive-backup 1\n\n\n\nThis is wrong, based on the settings, but no clear cause of why that is.\n\nFurther debugging is required, but in the interim, the following worked:\n\nifdown bond1\necho 4 &gt; /sys/class/net/bond1/bonding/mode\nifup bond1\n\n\n\nConfirmed with:\n\n$ cat /sys/class/net/bond1/bonding/mode\n802.3ad 4` And on the switch level: `sh etherchannel summary | include Po57\n57     Po57(SU)        LACP      Gi9/24(P)      Gi9/25(P)      Gi9/26(P)\n\n\n\nPresto-chango, we now have a working LACP port channel. Still, something odd\nis afoot.\n\nOne theory is that /etc/modprobe.conf contains definitions for bond0 (an\nInfiniband pair) and it is setup for active-backup (aka ‘mode=1’) – perhaps the\nstartup/init scripts are doing some weird thing and mis-picking up the bond0\nsettings? This is pure conjecture, in need of tracing through the scripts.\n",
        "url"      : "https://blog.janestreet.com/the-adventures-in-nic-bonding/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "OpenLDAP, DBD, and you.",
        "date"     : "February 17, 2012",
        "authorId" : "pmay",
        "author"   : "Pavel May",
        "tags"     : [],
        "minsToRead" : 3,
        "content"  : "So, today we’ve had us a wee bit of an issue with our OpenLDAP servers – to\nLDAP clients, the server, on attempts to do MOD operations, would say unhelpful\nthings like “error code 80 – other”. (Coincidentally, the full error should be\n“Error code: 80 – other (implementation-specific)”  After some\ndigging, the logs show:\n\nFeb 14 12:30:02 head02 slapd[19062]: conn=3229729 op=22190 MOD attr=mail\nFeb 14 12:30:02 head02 slapd[19062]: bdb(dc=XXXXXXX,dc=com): Lock table is out of available lock entries\nFeb 14 12:30:02 head02 slapd[19062]: =&gt; bdb_idl_insert_key: c_get failed: Cannot allocate memory (12)\n\n\n\nA slightly better way to ask “WTF, Over?” is thusly:\n\n[root@head02 log]# sudo -u ldap /usr/sbin/slapd_db_stat -h /var/lib/ldap/ -c\n40 Last allocated locker ID\n0x7fffffff Current maximum unused locker ID\n9 Number of lock modes\n1000 Maximum number of locks possible\n1000 Maximum number of lockers possible\n1000 Maximum number of lock objects possible\n80 Number of lock object partitions\n9 Number of current locks\n1063 Maximum number of locks at any one time\n4 Maximum number of locks in any one bucket\n0 Maximum number of locks stolen by for an empty partition\n0 Maximum number of locks stolen for any one partition\n45 Number of current lockers\n47 Maximum number of lockers at any one time\n9 Number of current lock objects\n511 Maximum number of lock objects at any one time\n3 Maximum number of lock objects in any one bucket\n0 Maximum number of objects stolen by for an empty partition\n0 Maximum number of objects stolen for any one partition\n742188 Total number of locks requested\n742179 Total number of locks released\n0 Total number of locks upgraded\n11 Total number of locks downgraded\n0 Lock requests not available due to conflicts, for which we waited\n0 Lock requests not available due to conflicts, for which we did not wait\n0 Number of deadlocks\n0 Lock timeout value\n0 Number of locks that have timed out\n0 Transaction timeout value\n0 Number of transactions that have timed out\n872KB The size of the lock region\n4 The number of partition locks that required waiting (0%)\n2 The maximum number of times any partition lock was waited for (0%)\n0 The number of object queue operations that required waiting (0%)\n0 The number of locker allocations that required waiting (0%)\n0 The number of region locks that required waiting (0%)\n3 Maximum hash bucket length\n\n\n\nSpecifically:\n\n1000 Maximum number of locks possible\n[...]\n1063 Maximum number of locks at any one time\n\n\n\nOne of the peculiarities inherent to BerkeleyDB is the notion of the\n“environment” (a directory which contains the BerkeleyDB files) which defines\nrun-time parameters, and once the databases (in this case – the actual LDAP\ndata as well as associated indexes) are initialized, the values are actually\nbaked in. There is a file called DB_CONFIG, but it, too, only applies at\ninitialization time. Or recovery time. Yep. In order to apply different run-time\nparameters, the database has to be “recovered”. (There is a third option –\nwriting C code to push a new run-time configuration struct into the environment,\nbut That Way Madness Lies a.k.a. much higher potential for getting it Wrong)\n\nSo, had to stop slapd (do a recovery without doing that, and all sorts of FUN\nwill ensue), modify DB_CONFIG, and then run\nsudo -u ldap /usr/sbin/slapd_db_recover -h /var/lib/ldap -v (Making backups of\nthe BDB environment is a Good Idea, just in case)\n\nOnce that completes (pretty quick), restart slapd and verify that the new\nrun-time parameters are in effect. Happy happy, joy joy?\n\nThe actual DB_CONFIG change:\n\n# pmay, 14February2012\n# We’re running out of locks.\n# Errors:\n# bdb(dc=XXXXXX,dc=com): Lock table is out of available lock entries\n# =&gt; bdb_idl_insert_key: c_get failed: Cannot allocate memory (12)\n# sudo -u ldap /usr/sbin/slapd_db_stat -h /var/lib/ldap -c\n# -&gt; shows that the max 1000 locks is being exceeded by the Max # of locks at any one time.\n# Trying to increase it via setting of\n# set_lk_max_locks to 1500\n# set_lk_max_locks 1500\n\n\n",
        "url"      : "https://blog.janestreet.com/openldap-dbd-and-you/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Iterative email problem solving with python, Part 2",
        "date"     : "February 2, 2012",
        "authorId" : "phahn",
        "author"   : "Patrick Hahn",
        "tags"     : [],
        "minsToRead" : 4,
        "content"  : "Once we’d fixed up all our message IDs (see part\n1) we let imapsync loose on the first\nfew beta testers who noticed that the dates on many of their older messages were\nwrong, sometimes hilariously so. This was pretty confusing since we weren’t\ntouching the dates as part of the migration; how had they changed?\n\nFortunately the observed dates were a clue, many of them were the date of the\nprevious migration from an even older mail system. We had managed to set the\nmessage date in imap to the date of that migration years ago. Because many mail\nclients use the Date: header to order messages nobody noticed until they\nswitched to the new system which uses the imap server’s timestamp. It definitely\nhad to be fixed and fortunately we already had a nice platform to extend with\nthis functionality.\n\nDate math is notoriously fiddly, doing the simple and obvious thing is almost\nalways wrong. Fortunately the python datetime library encapsulates most of the\ncomplexity allowing us to make a few assumptions about the timezone (that it’s\nlocal to the box this is running on) and let it handle the rest:\n\nmtime = os.stat(fullpath).st_mtime\ndate_mtime = datetime.datetime.fromtimestamp(mtime, dateutil.tz.gettz())\n\nfor i, hname in enumerate([\"Delivery-date\", \"Date\"]):\n  if message.has_key(hname):\n    date_header = dateutil.parser.parse(message[hname])\n    if not date_header.tzinfo:\n      date_header = date_header.replace(tzinfo=dateutil.tz.gettz())\n    break\n\ndelta = abs(date_header - date_mtime)\nif delta &gt; datetime.timedelta(days=2):\n  new_mtime = time.mktime(date_header_parsed.timetuple())\n  os.utime(filename, (new_mtime, new_mtime))\n\n\n\nWe merged this into the maildir lint script from the previous post, did the\nrequisite testing to make sure it was behaving as expected and ran it on the\nmaildirs of our intrepid beta testers. Which presented us with another problem:\nyou can’t modify the date of a message via imap without deleting it and\nre-adding it to the server. We elected to simply bulk-delete the mailboxes (with\nanother python script that’s four lines of deleting things and two dozen lines\nof sanity checking) and re-run the whole synchronization process from scratch.\n\nThere are tens of thousands of messages per mailbox so we let it run over a\nweekend, came back Monday and…the dates were still wrong. In fact the dates on\nthe synchronized messages were unchanged from the first attempt even though the\nmtimes were correct. A quick trip to djb’s original Maildir specification\ndemonstrated just how wrong we had been. Not only does the mtime not record the\ntimestamp, mtime isn’t used in the spec at all. Instead the first component of\nthe message’s filename is it’s date. A few facepalms and a quick modification to\nthe script later and we were ready for the third try:\n\ndate_maildir = datetime.datetime.fromtimestamp(float(os.path.basename(fullpath).split('.')[0]))\nif not date_maildir.tzinfo:\n  date_maildir = date_maildir.replace(tzinfo=dateutil.tz.gettz())\n\nif abs(date_header - date_maildir) &gt; datetime.timedelta(days=2):\n  head, tail = os.path.split(fullpath)\n  new_file = os.path.join(head,\n    re.sub(\"^d+.\",\n      str(int(time.mktime(date_header.timetuple()))) + '.',\n    tail))\n  if len(new_file) != len(fullpath):\n    # If someone is running this in 2286 I sincerely apologize.\n    raise Exception(\"Sanity check failed for renaming %s to %s\" % (filename, new_file))\n  os.rename(fullpath, new_file)\n\n\n\nWe again blew away the migrated mailboxes, re-linted the maildirs and\nre-synchronized them. The dates were correct on the other side this time so we\ndeclared victory on this small slice of the migration.\n",
        "url"      : "https://blog.janestreet.com/iterative-email-problem-solving-with-python-part-2/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Iterative email problem solving with python.",
        "date"     : "January 20, 2012",
        "authorId" : "phahn",
        "author"   : "Patrick Hahn",
        "tags"     : [],
        "minsToRead" : 2,
        "content"  : "One of the little joys of being a sysadmin is being able to solve a problem by\ncomposing a highly specific tool out of a set of much more general tools or\nlibraries. The evolution of this maildir linting program demonstrates how\neffective this model is and illustrates a bit of the iterative problem solving\nthat’s a sysadmin’s bread and butter. We get a basic level of functionality in\nthis post and a coming Part 2 will extend it to adapt to a new set of\nrequirements.  While migrating from our current mail system to a\nnewer, shinier one we discovered that there were a number of messages that\nimapsync couldn’t process due to missing Message-ID headers. Fortunately we\nstore mail on the backend in Maildir format so we can easily iterate over them\nwithout worrying about locking and we can use Python’s\n\nbuilt-in email library (and\nos.walk) to handle all the\nmessy parsing and header-splitting:\n\nfor dirname, dirnames, filenames in os.walk(maildir):\n  for thisfile in filenames:\n      fullpath = os.path.join(dirname, thisfile)\n      f=open(fullpath, 'r') #we might be using an older python\n      data = f.read()\n      message = message_from_string(data)\n      if 'MESSAGE-ID' not in (header.upper() for header in m.keys()):\n        msgid = utils.make_msgid(\"RETCON.%f.%i\" % (time.time(), total_count))\n        m.add_header(\"Message-Id\", msgid)\n        total_count = total_count+1\n\n\n\nOf course this is probably fine but we’d really like to be able to see what it’s\ndoing just in case it doesn’t do what we expect. Another little corner of the\nstandard library is difflib:\n\nbefore = data.split('n')\nafter = message.as_string().split('n')\nfor l in difflib.unified_diff(before, after, tofile=fullpath):\n  print l\n\n\n\nAnd for further paranoia we’d really like to save a backup of the messages we\nmodified. Look no further than\ntarfile for a nifty way to create\na tbz2 without creating any intermediates to clog things up:\n\nbackuptar = tarfile.open(full_backup_path, \"w|bz2\")\n...\nbackuptar.add(fullpath)\n\n\n\nAt this point we add some error checking and command line parsing and have not\nonly a solution to the immediate problem but a nice platform to extend should we\nfind other things that need adjusting during the migration.\n",
        "url"      : "https://blog.janestreet.com/iterative-email-problem-solving-with-python/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Making staging explicit",
        "date"     : "January 12, 2012",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 5,
        "content"  : "We just had an interesting discussion around the office about staging and\ncurried functions. I was reading over a new string-escaping routine, which had\nthe following signature:\n\nval escape : escapeworthy:Char.t list -&gt; escape_char:Char.t -&gt; string -&gt; string\n\n\n\nAnd the code had a big comment explaining that the performance of this was much\nbetter if you first partially applied the function, and then fed in the strings\nto be escaped one by one. In other words, this:\n\nList.map l ~f:(fun s -&gt; String.escape ~escapeworthy:[a;z] ~escape_char:_ s)\n\n\n\nis 2-10x slower than this:\n\nlet my_escape = String.escape ~escapeworth:[a;z] ~escape_char:_ in List.map l ~f:my_escape\n\n\n\nThat’s because when my_escape is computed, a bunch of work is done just once\nthat can be shared over multiple calls to my_escape, whereas a fully-applied\ncall to String.escape will redo that computation each time. But there’s no way\nof really seeing this by just looking at the type signature, and it’s not\nobvious from the call-point either. In this case, the issues is just one of\nperformance, but the difference between true staging and mere currying can be a\nsemantic one as well. Consider this function for creating a unique-id allocator.\n\nlet make_id_allocator () =\n  let ctr = ref 0 in\n  (fun () -&gt; incr ctr; !ctr)\n\n\n\nThe signature of this function looks a little odd at first glance:\n\nval make_id_allocator : unit -&gt; unit -&gt; int\n\n\n\nThis is truly a staged function, and there is an important semantic difference\nbetween partial and full applications. In particular, if we always use it as a\nfull application, like this:\n\nlet x = make_id_allocator () ()\nlet y = make_id_allocator () ()\n\n\n\nthe result will always be 1. Partial application, however, can give us\ndifferent answers:\n\nlet alloc = make_id_allocator ()\nlet x = alloc ()\nlet y = alloc ()\n\n\n\nHere, x is1 and y is 2.\n\nThe basic problem here is that the type signatures don’t tell you when there is\ntrue staging in a function, verses when the function is simply curried. In some\nways, I prefer the state of affairs in SML, where the default is that function\narguments are tuples, and curried functions are a special case.\n\nBut we can recover some of that explicitness with a minimum of fuss, and that’s\njust what we’re about to do in Core. We’ve done this by adding a new type,\ncalled Staged.t:\n\nopen Core.Std\n\nmodule Staged : sig\n  type 'a t\n  val stage : 'a -&gt; 'a t\n  val unstage : 'a t -&gt; 'a\nend = struct\n  type 'a t = 'a\n  let stage = Fn.id\n  let unstage = Fn.id\nend\n\n\n\nand we put stage and unstage at the top-level. Now, we can write our ID\nallocator this way:\n\nlet make_id_allocator () =\n  let ctr = ref 0 in\n  stage (fun () -&gt; incr ctr; !ctr)\n\n\n\nwhich has this signature:\n\nval make_id_allocator : unit -&gt; (unit -&gt; unit) Staged.t\n\n\n\nAnd we can use it like this:\n\nlet alloc = unstage (make_id_allocator ())\n  let x = alloc ()\n  let y = alloc ()\n\n\n\nBut what we can’t do is just use the function without noticing it’s staged. If\nwe want to rerun the initial stage every time, we have to do this:\n\nlet x = unstage (make_id_allocator ()) ()\nlet y = unstage (make_id_allocator ()) ()\n\n\n\nwhich makes it quite clear what’s going on. Similarly, with the original string\nescaping function, we can change the definition to this:\n\nval escape : escapeworthy:Char.t list -&gt; escape_char:Char.t -&gt; (string -&gt; string) Staged.t\n\n\n\nand now, it’s much clearer that this code:\n\nList.map l ~f:(fun s -&gt; unstage (String.escape ~escapeworthy:[a;z] ~escape_char:_) s)\n\n\n\nis doing repeated work, and it pushes you in the direction of writing this\ninstead:\n\nList.map l ~f:(unstage (String.escape ~escapeworthy:[a;z] ~escape_char:_ s))\n\n\n\nOne thing that I find satisfying about this example is that it shows how in a\nlanguage like OCaml, you can often overcome what feel like deficiencies in the\nlanguage itself simply by writing better libraries.\n",
        "url"      : "https://blog.janestreet.com/making-staging-explicit/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Fork Core!",
        "date"     : "January 7, 2012",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 0,
        "content"  : "We’ve now put our collection of open-source libraries out on bitbucket. You can\nfind and browse the code here.\n\nBy putting the code on bitbucket, we hope to make it easier for us to work\ntogether with other people in the OCaml community. We’re just starting (I just\nstarted telling people that we’re ready to accept patches a few days ago), and\nwe already have some\nimprovements coming\nin.\n\nWe’re still working out how the relationship with external contributors is going\nto work, but I’m very hopeful that this will hlep make Core a foundation\nsuitable for lots of people outside of Jane Street to use for building their own\napplications.\n\nIf you’re interested in contributing, join us on the mailing\nlist. \n",
        "url"      : "https://blog.janestreet.com/fork-core/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Moving files via rsync",
        "date"     : "January 3, 2012",
        "authorId" : "rdouglass",
        "author"   : "Ralph Douglass",
        "tags"     : [],
        "minsToRead" : 0,
        "content"  : "One of my favorite features in rsync is using it to ‘move’ files between boxes.\n\nrsync --remove-source-files\n\n\n\nremoves files from the source once rsync determines that the files exist on the\ndestination. A common use around here:\n\n$ rsync -av --remove-source-files\n  production-box:/some-system/output-files/\n  /big-storage-device/archives/some-system/$(date +%Y-%m-%d)/\n\n\n\nYou can also use an undocumented flag which does almost exactly the same thing.\n\nrsync --remove-sent-files\n\n\n\nsame as –remove-source-files, except that it leaves behind any files for which\nno update was necessary on the destination side (including metadata if you are\nusing a flag like -a).\n\nThe difference is subtle, and you probably usually want –remove-source-files.\nHowever, if you expect to always move to a clean location, remove-sent-files\nleaving behind something can act as a canary, telling you something is not as\nyou expected. It’s may be too late for that day’s data, but at least you know\nsomething wacky is going on for the next day. It’s helped me once, so I default\nto it in most cases.\n",
        "url"      : "https://blog.janestreet.com/moving-files-via-rsync/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "OCaml, the ultimate refactoring tool",
        "date"     : "November 2, 2011",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 2,
        "content"  : "This is why I love OCaml.\n\nI’ve been working on a big refactor of one of our core trading systems. It was a\nserious change. This is a system with many cooperating components, and in the\nold design, a certain amount of stitching work was required to integrate each\nnew component. The end result was that the core of the system was becoming\novergrown and hard to understand and extend.\n\nSo, I replaced this ad-hoc component system with a plug-in architecture where\nevery component registered itself in a uniform way, with a flexible, type-safe\nway for the components to communicate. Now, no changes are now required to the\ncore when you add a new component.\n\nWhile I was at it, I entirely replaced the mechanism for querying the system\nwith a new one that fit more cleanly into this new plug-in architecture, and at\nthe same time providing a better, more typeful interface to clients of the\nsystem, with integrated help and support for cross-version interoperability. We\nalso pulled out the configurations for the components into their own files with\nan explicit versioning scheme, to make it easier for outside tools that need to\ngenerate configs to deal with the system as it evolves.\n\nAll told, the change were pretty invasive. I had to modify virtually every file\nin the system, changing how the components interacted with each other, with a\nhandful of components needing to be seriously rewritten.\n\nA couple days ago, another developer here helped put a few finishing touches on,\nmerged in patches that had occurred in other branches, and got it up and running\nin test for the first time.\n\nAnd the shocking thing is, it worked.\n\nI’m not claiming that no bugs were introduced in the refactoring; surely we have\nmore testing work ahead of us. But I found it stunning that the whole system\nstarted up and worked without incident, with no testing whatsoever. And it’s not\nthat I’m such a great programmer. This kind of result is pretty routine, because\nthe type system is so effective at catching bugs, particularly bugs in code that\nhas been written carefully to make the most of the type system.\n\nPeople often talk about the value of refactoring tools that automate the tedious\nwork of renaming method calls and changing the order of arguments. Surely that’s\na nice thing to have. But it’s nothing compared to the help you get from OCaml’s\ntype system. OCaml doesn’t automate the work of refactoring, but it does greatly\nreduce the number of bugs that your refactoring introduces. And in the end,\nthat’s far more valuable.\n",
        "url"      : "https://blog.janestreet.com/ocaml-the-ultimate-refactoring-tool/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Announcing Async",
        "date"     : "October 25, 2011",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 2,
        "content"  : "We just released a new monadic concurrency library called\nAsync.\n\nAsync isn’t really new, at least, not within Jane Street. We’ve been using Async\nfor years, and it is now at the heart of almost all the systems we build.\nIndeed, developers at Jane Street have often bemoaned the fact that they find it\npainful to do projects at home because they miss Async. If you’re building any\nkind of code that touches the network, Async is a huge help. That internal\ndemand is part of why we’re releasing Async. Beyond that, so much of our own\ninternal software is built with Async that the lack of a public release for\nAsync made it hard to release anything else.\n\nBut we also hope that Async will be directly useful to others. It’s been\nbattle-tested by years of use, dozens of programmers and many applications. If\nyou’re building a networked service in OCaml, Async can be an excellent building\nblock.\n\nThat said, it’s still early days for Async outside of our walls. We’ve done\nalmost no work to get Async compiling and working well outside of Linux, for\nexample. There are also a number of improvements that we’ve made inside our\nwalls that haven’t yet made it to the public release branch.\n\nMore generally, we’re in the middle of trying to shift around how we manage the\npublic releases of our software. We’ve created a mailing list:\n\nocaml-core@googlegroups.com\n\nand we’re in the process of moving to using github for managing our external\nsource repositories. We’re hoping that Core and Async and the other associated\nlibraries can become a real foundation for people building production software\nwith OCaml.\n\nI’ll try to do some more blog posts about Async in the coming weeks, but one\nquestion that many experienced OCaml-hands might wonder is, how does this relate\nto the other monadic concurrency library for OCaml, Lwt?\n\nLwt is by all accounts an excellent piece of software, and we’ve actually stolen\na number of good ideas from them over the years. But we decided years back to\ncreate our own library rather than use Lwt because there were aspects of the\ndesign that we didn’t like. In particular, we think that Async does a better of\ncontrolling the concurrency of your program, making it easier to reason about\npossible race conditions. And we prefer Async’s approach to error handling,\nwhich we think does a better job of handling OCaml exceptions. Also, Async is\nwritten almost entirely in OCaml, whereas Lwt has thousands of lines of C code,\nwhich we think gives Async an edge in terms of our confidence in the\nimplementation.\n\nThat said, there are a lot of nice things about Lwt that Async doesn’t have. Lwt\nhas its own monadic syntax extension, a system of pluggable engines for\nsupporting different polling mechanisms, and a few other niceties. We imagine\nthat over time, the two libraries will exchange more ideas and will both get\nbetter as a result.\n\nIf you’re interested in Async, you can find it and the rest of the Core suite\nhere.\n",
        "url"      : "https://blog.janestreet.com/announcing-async/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "+'a and -'a",
        "date"     : "October 17, 2011",
        "authorId" : "sweeks",
        "author"   : "Stephen Weeks",
        "tags"     : [],
        "minsToRead" : 7,
        "content"  : "If you’ve ever wondered what it means in OCaml when there is a + or - in\nfront of a type variable, read on.\n\nOCaml has subtyping, which is a binary relation on types that says roughly, if\ntype t1 is a subtype of type t2, then any value of type t1 can be used\nanywhere a value of type t2 was expected. In OCaml, subtyping arises from\npolymorphic variants. For example, [ A ] is a subtype of [ A | B ]\n(typographical note: in this post, polymorphic variants are written without the\nleading grave accent (backtick) due to formatting issues). Why? Because if you\nhave a piece of code that knows how to deal with A and B, well then surely\nit can deal with just A.\n\nOne can ask OCaml to check subtyping using a coercion expression:\n\n(e : t1 :&gt; t2)\n\n\n\nFor example:\n\nlet f x = (x : [ A ] :&gt; [ A | B ])\n\n\n\nA coercion expression actually does two things; it verifies that t1 is a\nsubtype of t2, and causes the type of the entire expression to be t2.\n\nJust as the OCaml typechecker has rules for deciding when a let expression or a\nfunction call typechecks, it has rules for deciding when the subtyping in a\ncoercion expression is allowed. The subtyping rule for polymorphic variant types\nis clear – one polymorphic variant t1 is a subtype of another polymorphic\nvariant t2 if every constructor in t1 also appears in t2 (and with the\nsame type argument). OCaml then has rules to extend the subtyping relation to\nmore complex types. E.g. for tuples the rule is\n\nif t1 :&gt; t1' and t2 :&gt; t2' then (t1, t2) :&gt; (t1', t2')\n\n\n\nFor example:\n\nlet f x = (x : [ A ] * [ B ] :&gt; [ A | C ] * [ B | D ])\n\n\n\nThe rule for subtyping on tuples makes intuitive sense if you think about what\ncode could do with a tuple – it can take apart the pieces and look at them. So,\nif a tuple has fewer kinds of values in both its first and second components,\nthen code dealing with the tuple would still be fine.\n\nFor arrow types the subtyping rule is:\n\nif ta' :&gt; ta and tr :&gt; tr' then ta -&gt; tr :&gt; ta' -&gt; tr'\n\nFor example:\n\nlet f x = (x : [ A | B ] -&gt; [ C ] :&gt; [ A ] -&gt; [ C | D ])\n\n\n\nAgain, the rule makes sense if you think about what code can do with a function\nthat it has. It can feed the function arguments, and observe the results. So, if\na function can accept more kinds of inputs or returns fewer kinds of outputs,\nthen the code dealing with the function would still be fine.\n\nFor types in OCaml like tuple and arrow, one can use + and -, called\n“variance annotations”, to state the essence of the subtyping rule for the type\n– namely the direction of subtyping needed on the component types in order to\ndeduce subtyping on the compound type. For tuple and arrow types, one can write:\n\ntype (+'a, +'b) t = 'a * 'b\ntype (-'a, +'b) t = 'a -&gt; 'b\n\n\n\nOne sometimes hears the term “covariant” to describe to a type variable\nannotated with a + and “contravariant” to describe to a type variable\nannotated with a -. For example, the tuple type is covariant in all its\narguments. The arrow type constructor is contravariant in its first argument and\ncovariant in its second argument.\n\nIf you don’t write the + and -, OCaml will infer them for you. So, why does\none need to write them at all? Because module interfaces are designed to express\nthe contract between implementor and user of a module, and because the variance\nof a type affects which programs using that type are type correct. For example,\nsuppose you have the following:\n\nmodule M : sig\n  type ('a, 'b) t\nend = struct\n  type ('a, 'b) t = 'a * 'b\nend\n\n\n\nShould the following typecheck or not?\n\nlet f x = (x : ([ A ], [ B ]) M.t :&gt; ([ A | C ], [ B | D ]) M.t)\n\n\n\nIf one knows that ('a, 'b) M.t = 'a * 'b, then yes, it should type check. But\nthe whole point of an interface is that a user only knows what the interface\nsays. And it does not say that ('a, 'b) M.t = 'a * 'b. So in fact it does not\ntype check.\n\nVariance annotations allow you to expose the subtyping properties of your type\nin an interface, without exposing the representation. For example, you can say:\n\nmodule M : sig\n  type (+'a, +'b) t\nend = struct\n  type ('a, 'b) t = 'a * 'b\nend\n\n\n\nThis will give enough information to the OCaml type checker to typecheck uses of\nM.t for subtyping so that the following will type check.\n\nlet f x = (x : ([ A ], [ B ]) M.t :&gt; ([ A | C ], [ B | D ]) M.t)\n\n\n\nWhen you use variance annotations in an interface, OCaml will check that the\nimplementation matches the interface, as always. For example the following will\nfail to typecheck:\n\nmodule M : sig\n  type (+'a, +'b) t\nend = struct\n  type ('a, 'b) t = 'a -&gt; 'b\nend\n\n\n\nWhereas the following will typecheck:\n\nmodule M : sig\n  type (-'a, +'b) t\nend = struct\n  type ('a, 'b) t = 'a -&gt; 'b\nend\n\n\n\nHopefully this sheds some light on a somewhat obscure corner of the language.\n",
        "url"      : "https://blog.janestreet.com/a-and-a/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "A mailing list for Core",
        "date"     : "October 3, 2011",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 0,
        "content"  : "After giving a tutorial on the Core libraries at CUFP, it’s become clear that we\nneed to provide a more open forum for people to talk about (and eventually\ncontribute to) Core. As a small starting point, we have a brand-spanking-new\nmailing list that you can use for asking questions. here’s the address:\n\nocaml-core@googlegroups.com\n\nPlease use this in preference to opensource@janestreet.com when you have a\nquestion about Core.\n",
        "url"      : "https://blog.janestreet.com/a-mailing-list-for-core/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Coming to ICFP/CUFP? Propose a BOF!",
        "date"     : "September 13, 2011",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 0,
        "content"  : "CUFP will again be having Bird-of-a-Feather sessions to discuss a variety of FP\ntopics. Last year’s BOFs were a lot of fun, with topics ranging from\nmeta-programming to packaging systems.\n\nBut we need you to help! The BOFs only happen if people propose them. Click\nhere to learn more about the BOFs and to find out\nhow to propose a topic.\n",
        "url"      : "https://blog.janestreet.com/coming-to-icfpcufp-propose-a-bof/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Using types to track defaults",
        "date"     : "August 2, 2011",
        "authorId" : "sweeks",
        "author"   : "Stephen Weeks",
        "tags"     : [],
        "minsToRead" : 14,
        "content"  : "We use OCaml’s optional arguments a fair bit at Jane Street. One nagging problem\nhas been coming up with a good way of documenting in the mli for a library\nwhat the default value for an optional argument is. Of course, one could state\nthe default value in a comment, but one is not forced to do so; also the comment\nmay become stale and incorrect, so to be a sure a reader has to look at the ml\nfile to be sure what the actual default is.\n\nIt would be nice if at least for constants, one could specify the default in the\ntype of a function, and have the default enforced by the type checker. Something\nlike:\n\nmodule M : sig\n  val f : ?(i:int = 13) -&gt; ?(b:bool = false) -&gt; unit -&gt; int * bool\nend = struct\n  let f ?(i = 13) ?(b = false) () = i, b\nend\n\n\n\nThe following program would cause a type error due to the default for i not\nmatching the type.\n\nmodule M : sig\n  val f : ?(i:int = 14) -&gt; ?(b:bool = false) -&gt; unit -&gt; int * bool\nend = struct\n  let f ?(i = 13) ?(b = false) () = i, b\nend\n\n\n\nI’ve recently been trying to come up with an acceptably terse way of doing\nsomething like this without changing the language or type system. I have an\napproach that I’ll now explain.\n\nStart by defining a type constructor for a family of “singleton” types, each\ninstance of which has a single value:\n\ntype ('phantom, 'real) singleton\n\n\n\nThe 'real type argument is the actual type of the value (e.g. bool, int,\netc.). The 'phantom type is used to distinguish every singleton type from\nevery other singleton type. Here are some instances of the singleton type\nfamily.\n\ntype thirteen = (phantom_thirteen, int) singleton\ntype true_ = (phantom_true, bool) singleton\ntype false_ = (phantom_false, bool) singleton\n\n\n\nEach of these singleton types is inhabited by a single value.\n\nval thirteen : thirteen\nval true_ : true_\nval false_ : true_\n\n\n\nNext, define a type constructor for optional arguments that have a default,\nwhere the type argument is the singleton type that identifies what the default\nvalue is.\n\ntype 'singleton is_the_default\n\n\n\nUsing this we can write the type of our f function as:\n\nval f : ?(i:thirteen is_the_default) -&gt; ?(b:false_ is_the_default) -&gt; unit -&gt; int\n\n\n\nSo that we can call such functions, we define an override function that allows\none to override a default value with an actual value at a call.\n\nval override : 'real -&gt; (_, 'real) singleton is_the_default\n\n\n\nBecause override can produce any phantom type, it can override any singleton\ntype, so long as the real types agree.\n\nHere are some example calls to f.\n\nf ~i:(override 17) ()\nf ~b:(override false) ()\nf ~i:(override 17) ~b:(override false) ()\n\n\n\nFor convenience, we bind override to a prefix operator !!. Then usage is\nquite concise.\n\nlet (!!) = override\nf ~i:!!17 ()\nf ~b:!!false ()\nf ~i:!!17 ~b:!!false ()\n\n\n\nTo implement f in such a way that the type system enforces that the default is\nwhat the type says it is, we need another function:\n\nval defaults_to :\n    ('phantom, 'real) singleton is_the_default option\n  -&gt; ('phantom, 'real) singleton\n  -&gt; 'real\n\n\n\ndefaults_to returns the value of the override if it is provided, else it\nreturns the (only) value in the singleton type if not.\n\nNow we can define f.\n\nlet f ?i ?b () =\n  let i = defaults_to i thirteen in\n  let b = defaults_to b false_ in\n  i, b\n\n\n\nThe last piece we need is a way to define new singleton types. For that we use a\nfunction that takes the representative value and produces a module with a new\nphantom type, along with the the single value of the new singleton type.\n\nmodule type Singleton = sig\n  type phantom\n  type real\n  type t = (phantom, real) singleton\n  val t : t\nend\n\nval singleton : 'a -&gt; (module Singleton with type real = 'a)\n\n\n\nWe can now use singleton to define the singleton types and values that we\nneed:\n\nmodule Thirteen = (val singleton 13 : Singleton with type real = int)\nmodule True_ = (val singleton true : Singleton with type real = bool)\nmodule False_ = (val singleton false : Singleton with type real = bool)\ntype thirteen = Thirteen.t let thirteen = Thirteen.t\ntype true_ = True.t\nlet true_ = True.t\ntype false_ = False.t\nlet false_ = False.t\n\n\n\nThat’s the entire interface to optional arguments with default in their type.\nFor completeness, here is the interface in one place.\n\nmodule type Optional = sig\n  type ('phantom, 'real) singleton\n  type 'singleton is_the_default\n\n  val override : 'real -&gt; (_, 'real) singleton is_the_default\n\n  val defaults_to :\n    ('phantom, 'real) singleton is_the_default option\n    -&gt; ('phantom, 'real) singleton\n    -&gt; 'real\n\n  module type Singleton = sig\n    type phantom\n    type real\n    type t = (phantom, real) singleton\n    val t : t\n  end\n\n  val singleton : 'a -&gt; (module Singleton with type real = 'a)\nend\n\n\n\nThe implementation of Optional is trivial. Singletons and defaults are just\nthe underlying value.\n\nmodule Optional : Optional = struct\n  type ('phantom, 'real) singleton = 'real\n  type 'singleton is_the_default = 'singleton\n\n  let override x = x\n\n  let defaults_to opt default =\n    match opt with\n    | None -&gt; default\n    | Some x -&gt; x\n  ;;\n\n  module type Singleton = sig\n    type phantom\n    type real\n    type t = (phantom, real) singleton\n    val t : t\n  end\n\n  let singleton (type t) (t : t) =\n    (module struct type phantom\n      type real = t\n      type t = real\n      let t = t\n    end : Singleton with type real = t)\n  ;;\nend\n\n\n\nAnd here’s some example code to test the new module.\n\ninclude struct\n  open Optional\n  type 'a is_the_default = 'a Optional.is_the_default\n  let defaults_to = defaults_to\n  let (!!) = override\n  let singleton = singleton\n  module type Singleton = Singleton\nend\n\nmodule Bool : sig\n  type t = bool\n  module True_ : Singleton with type real = t\n  module False_ : Singleton with type real = t\nend = struct\n  type t = bool\n  module True_ = (val singleton true : Optional.Singleton with type real = t)\n  module False_ = (val singleton false : Optional.Singleton with type real = t)\nend\n\ninclude struct\n  open Bool.True_\n  type true_ = t\n  let true_ = t\nend\n\ninclude struct\n  open Bool.False_\n  type false_ = t\n  let false_ = t\nend\n\nmodule Test_bool : sig\n  val f :\n    ?x:true_ is_the_default\n    -&gt; ?y:false_ is_the_default\n    -&gt; unit -&gt; bool * bool\nend = struct\n  let f ?x ?y () =\n    let x = defaults_to x true_ in\n    let y = defaults_to y false_ in\n    x, y\n  ;;\nend\n\nlet () =\n  let f = Test_bool.f in\n  assert ((true , false) = f ());\n  assert ((false, false) = f ~x:!!false ());\n  assert ((false, true ) = f ~x:!!false ~y:!!true ());\n;;\n\nmodule Int : sig\n  type t = int\n  module N_zero : Singleton with type real = t\n  module N_one : Singleton with type real = t\n  module N_million : Singleton with type real = t\nend = struct\n  type t = int\n  module N_zero = (val singleton 0 : Optional.Singleton with type real = t)\n  module N_one = (val singleton 1 : Optional.Singleton with type real = t)\n  module N_million = (val singleton 1_000_000 : Optional.Singleton with type real = t)\nend\n\nmodule Test_int : sig\n  val f :\n    ?x:Int.N_zero.t is_the_default\n    -&gt; ?y:Int.N_one.t is_the_default\n    -&gt; ?z:Int.N_million.t is_the_default\n    -&gt; unit\n    -&gt; int * int * int\nend = struct\n  let f ?x ?y ?z () =\n    let x = defaults_to x Int.N_zero.t in\n    let y = defaults_to y Int.N_one.t in\n    let z = defaults_to z Int.N_million.t in\n    x, y, z\n  ;;\nend\n\nlet () =\n  let f = Test_int.f in\n  assert ((0, 1, 1_000_000) = f ());\n  assert ((0, 1,        13) = f ~z:!!13 ());\n  assert ((1, 2,         3) = f ~x:!!1 ~y:!!2 ~z:!!3 ());\n;;\n\n\n\nWith 3.13, the usage will be nicer because we won’t have to state the redundant\npackage type when we define new singletons. E.g. we will be able to write the\nfollowing:\n\nmodule True_ = (val singleton true)\n\n\n\nBut even without that, it doesn’t seem too painful to start using this right\nnow, since one only needs to define a few singleton types for the common default\nvalues, and the actual definition and use of functions with optional arguments\nis pretty syntactically lightweight.\n\nComments or suggestions for improvement anyone?\n",
        "url"      : "https://blog.janestreet.com/using-types-to-track-defaults/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Rethinking Univ",
        "date"     : "July 31, 2011",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 5,
        "content"  : "A few years back, Stephen wrote a fun\npost about how to build a so-called\n“universal type” in OCaml. Such a type allows you to embed any other type within\nin it, letting you do things like creating ad-hoc lists containing elements of\nmultiple different types.\n\nI’ve been thinking about universal types again because I’ve been working on a\nproject lately that uses a universal type as a central architectural piece. Most\nML programmers find the idea of a universal type a little jarring, and so I’ve\nbeen thinking about how to present it to make it easy to understand. \nPerhaps the first thing to consider is what the signature should look like.\nHere’s what I started with:\n\nmodule Univ : sig\n  module Type : sig\n    type 'a t\n    val create : unit -&gt; 'a t\n  end\n\n  type t\n  val embed : 'a Type.t -&gt; 'a -&gt; t\n  val project : 'a TYpe.t -&gt; t -&gt;'a option\nend\n\n\n\nAnd here’s a simple example of Univ in action.\n\nlet int_type = (Univ.Type.create () : int Univ.Type.t)\nlet string_type = (Univ.Type.create () : string Univ.Type.t)\n\nlet mixed_list = [ Univ.embed int_type 3\n                 ; Univ.embed string_type \"whatever\"\n                 ; Univ.embed int_type 5 ]\n\nlet () =\n  assert (List.filter_map ~f:(Univ.project int_type ) mixed_list\n         = [3;4]);\n  assert (List.filter_map ~f:(Univ.project string_type) mixed_list\n         = [\"whatever\"]);\n\n\n\nBut there’s something pointlessly confusing about this type. For one thing, the\nuse of the term “type” makes promises that can’t quite be satisfied. For\ninstance, there’s no guarantee that two values of the same type embedded into\nUniv.t can be reached in the same way. Consider this example.\n\nlet int_type = (Univ.Type.create () : int Univ.Type.t)\nlet int_type' = (Univ.Type.create () : int Univ.Type.t)\n\nlet mixed_list = [ Univ.embed int_type 3\n                 ; Univ.embed int_type' 4\n                 ; Univ.embed int_type 5 ]\n\nlet () =\n  assert (List.filter_map ~f:(Univ.project int_type) mixed_list\n          = [3;4;5]);\n\n\n\nWHen I tried to explain what Univ was to people verbally, I described it as a\nkind of extensible sum type. When Stephen heard me giving this description, he\nproposed that we change the type signature to reflect this. We ended up with a\nsignature that looks like this:\n\nmodule Univ : sig\n  module Variant : sig\n    type 'a t\n    val create : unit -&gt; 'a t\n  end\n\n  type t\n  val create : 'a Variant.t -&gt; 'a -&gt; t\n  val match_ : 'a Variant.t -&gt; t -&gt; 'a option\nend\n\n\n\nThe names now point you in the right direction: every time you call\nVariant.create, you’re creating a new arm of this extensible sum type. There’s\nno guarantee that two variants won’t be created with the same type, and there’s\nno need for such a guarantee.\n\nWe also changed the embed and project to create and match_, to better\ntrack the terminology used in the rest of the language for constructing and\ndeconstructing sum types.\n\nAlong the way, we made one other change to Univ, which was to add some default\nfunctionality to each variant. In particular, the ability to figure out the name\nof any variant within the Univ type, and the ability to serialize a Univ.t to\nan s-expression. This is useful for dynamically browsing a collection of\nUniv.t values at run-time. The final interface looks like this:\n\nmodule Univ : sig\n  module Variant : sig\n    type 'a t\n    (** [create variant_name to_sexp] creates a new variant with the\n      given name and serializer *)\n    val create : string -&gt; ('a -&gt; Sexp.t) -&gt; 'a t\n  end\n\n  type t\n  val create : 'a Variant.t -&gt; 'a -&gt; t\n  val match_ : 'a Variant.t -&gt; t -&gt; 'a option\n\n  val to_sexp : t -&gt; Sexp.t\n  val variant_name : t -&gt; string\nend\n\n\n\nThe other thing that struck me about this is that when I first heard about it,\nthe idea of Univ seemed interesting but mostly a curiousity – I just didn’t\nhave any applications for it in mind.\n\nBut just because an idea isn’t useful right now, doesn’t mean it’s not going to\nbecome useful later.\n",
        "url"      : "https://blog.janestreet.com/rethinking-univ/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Time to register for CUFP!",
        "date"     : "July 28, 2011",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 0,
        "content"  : "If you’re interested in applying functional programming to real world problems,\nthen you should consider joining us at CUFP, the Commercial Users of Functional\nProgramming workshop. It’s colocated with ICFP, which is in Tokyo this year.\nCUFP involves tutorials on a variety of FP topics, a day of talks from people\nwho have been putting FP to work, as well as BOF sessions in the evenings.\n\nOne of the tutorials will even be given by yours truly. I’m giving a tutorial on\nJane Street’s Core library.\n\nSo join us. You can read the call for participation,\nlook over the detailed schedule, and, of\ncourse, register.\n\nNote: the deadline is August 15th if you want to get reduced early registration\nfees.\n\n\n",
        "url"      : "https://blog.janestreet.com/time-to-register-for-cufp/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Dropping history with Mercurial",
        "date"     : "June 24, 2011",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 6,
        "content"  : "One feature we’ve wanted from hg for a while is the ability to drop history.\nIt’s a natural thing to want, after all; with any sufficiently active repo,\nyou’ll eventually need to drop history. For us, this is only an issue with our\nsingle most active tree, which weighs in at about 120k changesets and 2.3G.\n Given that there is no magic “drop history” button to press, what\ncan you do? One approach would be to abandon your old history entirely, copying\nthe state at tip to a brand new repo with no history at all. But this is\nproblematic. You keep your history around for a reason. You need it to handle\nmerges, and it’s nice to have when you try to understand the origin of a certain\nchange in your tree.\n\nAnother approach is to create a new repo with a subset of your history.\nhg convert can help you do that. You can pick a base revision x, and\nhg convert will create for you a new repo that has revisions corresponding to\nx and all its descendants. Note that this new tree is, in hg terminology,\nunrelated to the original, which is to say, you can’t directly merge between\nthem. This appoach will keep around a subset of your history, which if you pick\nx carefully, will let you do merges cleanly.\n\nBut there are problems here too. Cutting a new tree like this requires an\nall-at-once conversion from the full repos to the truncated ones. For a set of\ndevelopers that are actively working on dozens of branches around the clock,\nthis is hard to swallow.\n\nWe’ve found a solution to this problem that seems to get around all of these\nissues. We essentially maintain two worlds, a full world that has a complete\nhistory, and a truncated world that omits all history before a certain revision\nx, and we’ve built what we call a convert daemon that ferries patches back\nand forth between the two worlds.\n\nThe end result is that you can pull from and make changes to either the full or\ntruncated worlds, so it just doesn’t matter that much where you make them. This\nis allowing us to migrate slowly to the truncated repos, without introducing\ncommunication barriers between those who have and haven’t made the jump. Plus,\nyou can use the more efficient truncated world for day-to-day use, but can still\ngo back to the full world if you want to dive into older history.\n\nBuilding the Daemon\n\nSo how do you build a convert daemon? Our solution is based on hg convert,\nwhich does the grotty work of actually converting changesets from one world to\nthe other. But you need more than that to create a bi-directional bridge.\n\nThe convert daemon is built on top of two repositories, which we’ll call\nfull-convert and trunc-convert, corresponding to the full and truncated\nworld respectively. These two repos are multi-headed, and the basic workflow is\nto push a revision to one repo, and ask for the daemon to convert it and push it\nto the other repo.\n\nYou can imagine the interface to the convert daemon including the following two\nfunctions:\n\n(* converts revision from cvt-full to cvt-trunc. Returns None if the provided\n  revision is not in cvt-full. If it returns [Some rev],\n  then that rev is available in the cvt-trunc repo. This\n  transformation drops history from the full world. *)\nval forward_convert : revision -&gt; revision option\n\n(* like [forward_convert], but for pushing from cvt-trunc to cvt-full. *)\nval backward_convert : revision -&gt; revision option\n\n\n\nThere are some invariants that should hold. First, for any single revision,\nmultiple calls to forward_convert (or backward_convert) should always\nproduce the same output revision. Also, running forward_convert and then\nbackward_convert should be the identity. Note that the conversion mechanism in\nhg convert is not necessarily deterministic, so to make sure that this holds,\nyou need a consistent revision map that keeps track of which revisions have been\nconverted.\n\nOnce you have this core abstraction in mind, the implementation of the daemon is\npretty straight ahead. The next question is, how do you use this daemon to tie\ntogether the full and truncated worlds? For us, doing this requires figuring out\nthe interplay between the compile daemon and the convert daemon.\n\nIntegrating with the compile daemon\n\nAs described in a post I put up a few years back, our development process\ndepends critically on a compile daemon. A compile-daemon managed tree actually\nhas two related repositories, a primary repo, and a staging repo. The\nprimary repo is where you pull from to get a clean, compiling version of the\ntree. The staging repo is where you push your proposed changes for consideration\nfor inclusion into the primary repo. Staging is multi-headed, meaning that you\npush changes to it without merging.\n\nThe compile daemon’s job is to grab heads out of staging and see if they can be\nbrought into the primary repo. The compile daemon checks that a head merges\ncleanly, compiles it, and runs the unit tests. If everything passes, then the\nrevision in question (along with the new merge node) is added to the primary\ntree. Otherwise, it is marked as rejected.\n\nThe compile daemon and the convert daemon work together. Notably, you have just\none compile daemon between the full and truncated worlds, and the convert daemon\nis set up to route patches from both worlds through it. The following picture\ndescribes this merge flow.\n\n\n\nThe green arrow shows the actions of the compile daemon, bringing in patches\nfrom full-staging to full. (Note that we could just as well move the compile\ndaemon to the truncated world, and indeed, it would probably be marginally more\nefficient, since cloning would be faster.) Clients are shown pulling from full\nand pushing to full-staging, and similarly pulling from trunc and pushing to\ntrunc-staging.\n\nThe purple and blue arrows show the actions of the convert daemon. The convert\ndaemon pulls heads from trunc-staging and converts and pushes them to\nfull-staging. From full-staging, those patches will be considered for\ninclusion by the compile daemon. Then, whatever shows up in full will be\npushed over to trunc (and trunc-staging. It’s confusing to have things in\ntrunc but not trunc-staging) by the convert daemon.\n\nThat’s basically the whole system. We’ve been using it gingerly for the last\nmonth, and it seems to be working quite solidly. I expect we will transition all\nof our development over to these new truncated trees in the next couple of\nmonths.\n\nOne interesting side effect of all this is that you can do other transformations\non your tree using the convert daemon. If in your past people have committed\nabsurdly large files, for example, the convert daemon can weed those out. It\nseems like there are a lot of potential applications for this kind of\nbidirectional bridge.\n",
        "url"      : "https://blog.janestreet.com/dropping-history-with-mercurial/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Core Gems: many happy returns",
        "date"     : "April 25, 2011",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 4,
        "content"  : "One recent arrival in Core is with_return, a function that lets you return\nearly from a computation. Here’s a trivial example:\n\nlet sum_until_first_negative list =\n  with_return (fun r -&gt;\n    List.fold list ~init:0 ~f:(fun acc x -&gt;\n      if x &gt;= 0 then acc + x else r.return acc))\n\n\n\nOne thing that might not be obvious in the above example is what the type of r\nis.  It turns out it’s this:\n\ntype 'a return = { return : 'b . 'a -&gt; 'b }\n\n\n\nThe reason for this is to have the return function be truly polymorphic in its\nreturn value. That way, like raise, it can be used in any context because its\nreturn value will unify with any type.\n\nNote that the sum_until_first_negative example will work even without the\nrecord trick, because r.return is used in only one place (and more\nimportantly, with only one type.) But if you want this to work in the general\ncase, you need the record.\n\nSo, how does one go about implementing this? Here’s the implementation that’s\ncurrently in Core. Note that we use a locally defined exception, to make sure\nthat only this exception handler can catch the exceptions in question. Also, we\nuse a ref to store the value being returned.\n\nlet with_return f =\n  let module M =\n    struct exception Return end\n  in\n  let r = ref None in\n  let return = {\n    return = (fun x -&gt;\n      r := Some x;\n      raise M.Return);\n  }\n  in\n  try f return\n  with M.Return -&gt;\n    match !r with\n    | None -&gt; assert false\n    | Some x -&gt; x\n\n\n\nIn OCaml 3.12, we can make this code, simpler, safer and more efficient:\n\nlet with_return (type t) (f : _ -&gt; t) =\n  let module M =\n    struct exception Return of t end\n  in\n  let return = { return = (fun x -&gt; raise (M.Return x)); } in\n  try f return with M.Return x -&gt; x\n\n\n\nI’d like to be able to improve this further by getting rid of the overhead of\nthe record, but I suspect it’s not possible.\n\nIt’s worth noting that with_return has its pitfalls. For example, there’s no\nguarantee that the return always terminates the full computation. For example,\nthe following code:\n\nlet foo = with_return\n  (fun r -&gt; try r.return 0 with _ -&gt; 1)\n\n\n\nwill return 1, not 0. Another interesting behavior is to see what happens when\nr.return is called outside the scope of the closure passed into with_return.\nConsider the following convoluted function, which returns an optional function\nwhich, when called, calls r.return.\n\nlet foo = with_return\n  (fun r -&gt; Some (fun () -&gt; r.return None))\n\n\n\nIf you call the function contained in foo, you’ll get the following response:\n\n# let f = Option.value_exn foo;;\nval f : unit -&gt; 'a = &lt;fun&gt;\n# f ();;\nException: Return 0.\n\n\n\nThis isn’t particularly surprising, once you understand the implementation, and\nit’s semantically fairly reasonable. The only thing you could really ask for\nbeyond this is a warning that the return has escaped its scope, but I think this\nis beyond the powers of the type-checker.\n",
        "url"      : "https://blog.janestreet.com/core-gems-many-happy-returns/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "OCaml is smarter than I thought",
        "date"     : "April 9, 2011",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 2,
        "content"  : "People (myself included) like to say that OCaml isn’t really an optimizing\ncompiler, that it has a pretty straight-ahead compilation strategy, and for the\nmost part, you get what you it looks like you get when you write the code.\n\nBut it turns out, OCaml does a little more magic than I’d counted on.\n Consider the following code:\n\nlet f x y =\n  match x,y with\n  | (0,0) -&gt;\n  true | _, _ -&gt; false\n\n\n\nI had thought that this actually allocated a tuple, and I was getting ready to\npush to try to get this fixed in the compiler. Before making a fool of myself, I\nthought I’d go and look at the generated assembly first, and lo and behold, I\nwas wrong! The compiler does what one would hope and avoids the needless\nallocation. To see what the code looked like if I forced the allocation of a\ntuple, I changed the code to pass the tuple to a tuple-taking function.\n\nlet sum (x,y) = x + y\n\nlet f x y =\n  match x,y with\n  | (0,0) as pair -&gt; ignore (sum pair); true\n  | _, _ -&gt; false\n\n\n\nI then generated the assembly, and looked again, only to discover that the\nfunction had been inlined, thus defeating the need for allocation. So, I tried\nagain, this time adding a string constant to the body of sum, which prevents\ninlining (a deficiency that ocamlpro is working on).\n\nlet sum (x,y) =\n  ignore \"z\";\n  x + y\n\nlet f x y =\n  match x,y with\n  | (0,0) as pair -&gt; ignore (sum pair); true\n  | _, _ -&gt; false\n\n\n\nI’d prevented the inlining, but there was still no allocation! Why? Well, it\nturns out that OCaml can optimize a tuple-taking function to get the elements of\nthe tuple passed in via registers, which is exactly what happened. And again,\nthe compiler realized that no allocation was required.\n\nFinally, I was able to trigger an allocation by changing sum to refer to the\ntuplified form of its arguments explicitly:\n\nlet sum ((x,y) as _p) =\n  ignore \"z\";\n  x + y\n\n\n\nAnd this finally triggers the allocation.\n\nAnyway, none of this is that surprsing – indeed, other people at Jane Street\nknew perfectly well that OCaml did these optimizations. But it was a pleasant\nsurprise for me nonetheless.\n",
        "url"      : "https://blog.janestreet.com/ocaml-is-smarter-than-i-thought/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "OCamlPro and the future of OCaml",
        "date"     : "April 6, 2011",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 2,
        "content"  : "Fabrice Le Fessant has just set up a new company,\nOCamlPro, whose goal is to provide commercial support\nfor OCaml, and to make it a more effective platform for people who use OCaml as\na production tool.\n\nI think this is great news, and says good things about the future of the\nlanguage. The core team at INRIA has always done a great job of maintaining the\ncompiler, but I had gotten worried over the last few years that they had lost\ninterest in making further improvements.\n\nThe last year or so has proven that wrong. The improvements in OCaml 3.12 have\nbeen pretty dramatic. The arrival of first-class packaged modules really changes\nthe fabric of the language in a meaningful way. Notably, the core features were\nimplemented outside of INRIA, in particular, by the folk at LexiFi. And the\nimprovements are not over; there is a lot of cool stuff coming in 3.13 as well.\n\nBut there are other areas where there’s been less progress; in particular,\nimprovements to the runtime and to the surrounding toolchain have been less\ndramatic.\n\nThat’s where I’m hoping that OCamlPro will step in. We’re currently working with\nOCamlPro to improve the inlining and unboxing done by the compiler, but there\nare lots of other areas where we’re hoping that OCamlPro can drive improvements.\nHere’s a summary of some of the ideas we’ve been batting around with Fabrice and\ncompany. To be clear, this is all pretty speculative.\n\n\n  Improving support for multicore systems by making it possible to run\nmultiple OCaml runtimes on different threads.\n  Improving the OCaml toolchain so that there is better support for IDE-style\nfeatures such as autocompletion and integrated documentation.\n  Adding support for namespaces, which should improve on the current approach\nof using so-called packed modules for grouping libraries.\n  Improving support for writing type-driven code generators like sexplib and\nbin-prot without the going through camlp4. This should improve both speed of\ncompilation and, critically, the ease of writing such extensions.\n\n\nAnd there’s more. It’s a lot to do, and it’s not going to happen all at once.\nBut I’m hopeful that OCamlPro will become a real driver of OCaml development.\nAnd our goal in all of this is to drive these changes upstream. That’s going to\nslow things down, but I believe the discipline of getting the changes up to the\nstandards required by the core team will be worthwhile. And by keeping the work\nflowing upstream, we will avoid the problems of creating a permanent fork to the\ncompiler.\n",
        "url"      : "https://blog.janestreet.com/ocamlpro-and-the-future-of-ocaml/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Effective ML Revisited (with videos)",
        "date"     : "March 27, 2011",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 0,
        "content"  : "Here are the videos to go with the guest lecture I just\ngave at Harvard. It’s not too different from the one that I gave last time,\nexcept that I spent the entire time on the core points, and skipped the section\nat the end about phantom types.\n\nHere it is, in two parts.\n\n\n  \n\n\n\n  \n\n\n",
        "url"      : "https://blog.janestreet.com/effective-ml-revisited-with-videos/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Core 0.7.0 is out!",
        "date"     : "March 25, 2011",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 0,
        "content"  : "The newest release of Jane\nStreet’s core standard\nlibrary is now out! It’s been a while, and a lot of small things have changed in\nthe interim. Here are some of the bigger changes:\n\n\n  Support for 3.12\n  Everything is now packaged with Oasis\n  type-conv has been improved so that you don’t need to set the\nTYPE_CONV_PATH anymore.\n  Core has a new highly optimized version of Hashtables which is about as\nspace and time efficient as (ocaml’s) Hashtbl but degrades gracefully in the\npresence of hash collisions.\n  A number of functions in the List modules have been optimized (map,@…).\nThey are now about as fast as the ones in the standard library but are also\ntail recursive.\n\n\nEnjoy! \n",
        "url"      : "https://blog.janestreet.com/core-0-7-0-is-out/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Do you use FP in anger? Then talk about it at CUFP this year!",
        "date"     : "March 23, 2011",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 0,
        "content"  : "CUFP (Commercial Users of Functional Programming) is a yearly workshop\n(associated with ICFP) where functional programmers come together to share\nideas, and this year the workshop is going to be in Tokyo. If you have\nexperience putting functional programming to work in solving real-world tasks,\nyou should consider submitting a talk proposal to CUFP.\n\n\nYou can find the call for participation\nhere. We’re soliciting both\nexperience reports (summaries of experiences in using FP in the real world) and\nin technical talks (covering a particular technique or methodology that’s\nrelevant to the practical application of functional languages.)\n\nBut CUFP isn’t just about talks – there’s will also be a set of invited\ntutorials covering a range of topics, as well as a more informally arranged set\nof BOFs. So, if you care about applications of functional programming, you\nshould consider participating, either as a presenter or as an attendee.\n",
        "url"      : "https://blog.janestreet.com/do-you-use-fp-in-anger-then-talk-about-it-at-cufp-this-year/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Effective ML Revisited",
        "date"     : "March 9, 2011",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 8,
        "content"  : "Harvard is again teaching OCaml to its first-year students, and Greg Morrissett\nagain this year invited me to give a guest lecture. I gave a version of the\nEffective ML talk that I gave last year to the same\nclass.\n\nOCaml seems to be getting some real currency as a teaching language in the US.\nHarvard, Penn and Cornell are all teaching it to their undergraduates as part of\nthe standard curriculum. And SML has of course been taught for a long time at\nCMU, and Brown and Northeastern teach Scheme/Racket.\n\nAs a side note, the class has grown considerably, with over 200 students\nenrolled in cs51. Apparently, a lot of the growth is the result of people seeing\n“The Social Network” and getting inspired to go study Computer Science….\n\nWhen I gave this talk last, there were some requests for the code snippets, so\nI’ve included the (updated) snippets below. In order to understand the context\nof these, watch last year’s talk!\n\nUse uniform interfaces\n\nmodule type Comparable = sig\n  type t\n\n  val compare : t -&gt; t -&gt; [ `Lt | `Eq | `Gt ]\n\n  val ( &gt;= ) : t -&gt; t -&gt; bool\n  val ( &lt;= ) : t -&gt; t -&gt; bool\n  val ( = ) : t -&gt; t -&gt; bool\n  val ( &gt; ) : t -&gt; t -&gt; bool\n  val ( &lt; ) : t -&gt; t -&gt; bool\n  val ( &lt;&gt; ) : t -&gt; t -&gt; bool\n  val min : t -&gt; t -&gt; t\n  val max : t -&gt; t -&gt; t\n\n  module Map : Core_map.S with type key = t\n  module Set : Core_set.S with type elt = t\nend\n\n\n\nmodule Char : sig\n  type t\n  include Comparable with type t := t\n  include Stringable with type t := t\n  include Hashable with type t := t\nend\n\n\n\nMake illegal states unrepresentable\n\nBefore:\n\ntype connection_state =\n| Connecting\n| Connected\n| Disconnected\n\ntype connection_info = {\n  state: connection_state;\n  server: Inet_addr.t;\n  last_ping_time: Time.t option;\n  last_ping_id: int option;\n  session_id: string option;\n  when_initiated: Time.t option;\n  when_disconnected: Time.t option;\n}\n\n\n\nAfter:\n\ntype connecting = { when_initiated: Time.t; }\ntype connected = { last_ping : (Time.t * int) option;\n          session_id: string; }\ntype disconnected = { when_disconnected: Time.t; }\n\ntype connection_state =\n| Connecting of connecting\n| Connected of connected\n| Disconnected of disconnected\n\ntype connection_info = {\n  state : connection_state;\n  server: Inet_addr.t;\n}\n\n\n\nCode for exhaustiveness\n\ntype message = | Order of Order.t\n               | Cancel of Order_id.t\n               | Exec of Execution.t\n\nlet position_change m =\n  match m with\n  | Exec e -&gt;\n    let dir = Execution.dir e in\n    Dir.sign dir * Execution.quantity e\n  | _ -&gt; 0\n\n\n\nOpen few modules\n\nBefore:\n\nopen Core.Std\nopen Command\nopen Flag\n\ntype config = { exit_code: int;\n        message: string option; }\n\n\nlet command =\n  let default_config = { exit_code = 0; message = None } in\n  let flags =\n    [ int \"-r\" (fun cfg v -&gt; { cfg with exit_code = v });\n      string \"-m\" (fun cfg v -&gt; { cfg with message = v });\n    ]\n  in\n  let main cfg =\n    Option.iter cfg.message (fun x -&gt; eprintf \"%s\\n\" x);\n    cfg.exit_code\n  in\n  create ~summary:\"does nothing, successfully\"\n    ~default_config ~flags ~main\n\nlet () = run command\n\n\n\nAfter:\n\nopen Core.Std\n\ntype config = { exit_code: int;\n        message: string option; }\n\n\nlet command =\n  let default_config = { exit_code = 0; message = None } in\n  let flags =\n    let module F = Command.Flag in\n    [ F.int \"-r\" (fun cfg v -&gt; { cfg with exit_code = v });\n    F.string \"-m\" (fun cfg v -&gt; { cfg with message = v });\n    ]\n    in\n    let main cfg =\n      Option.iter cfg.message (fun x -&gt; eprintf \"%s\\n\" x);\n      cfg.exit_code\n    in\n    Command.create ~summary:\"does nothing, successfully\"\n    ~default_config ~flags ~main\n\nlet () = Command.run command\n\n\n\nMake common errors obvious\n\nmodule List : sig\n  type 'a t\n\n  ....\n\n  val find : 'a t -&gt; ('a -&gt; bool) -&gt; 'a option\n  val find_exn : 'a t -&gt; ('a -&gt; bool) -&gt; 'a\n\n  val hd : 'a t -&gt; 'a option\n  val hd_exn : 'a t -&gt; 'a\n\n  val reduce : 'a t -&gt; ('a -&gt; 'a -&gt; 'a) -&gt; 'a option\n  val reduce_exn : 'a t -&gt; ('a -&gt; 'a -&gt; 'a) -&gt; 'a\n\n  val fold : 'a t -&gt; init : 'b -&gt; ('a -&gt; 'b -&gt; 'b) -&gt; 'b\n\n  .....\n\nend\n\n\n",
        "url"      : "https://blog.janestreet.com/effective-ml-revisited/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "A trick: recursive modules from recursive signatures",
        "date"     : "October 1, 2010",
        "authorId" : "nlinger",
        "author"   : "Nathan Linger",
        "tags"     : [],
        "minsToRead" : 2,
        "content"  : "Stephen taught me a neat trick a while back. Suppose you want to define a some\nmutually recursive types\n\ntype even = Zero | Even_succ of odd\nand odd = Odd_succ of even\n\n\n\nNow suppose you want to do this in such a way that each type belongs to its own\nmodule. Since OCaml requires signature annotations in recursive module\ndefinitions, I thought this required one to write out the type definitions\ntwice.\n\nmodule rec Even : sig\n  type t = Zero | Succ of Odd.t\nend = struct\n  type t = Zero | Succ of Odd.t\nend\nand Odd : sig\n  type t = Succ of Even.t\nend = struct\n  type t = Succ of Even.t\nend\n\n\n\nHowever, Stephen showed me the following trick\n\nmodule rec Even : sig\n  type t = Zero | Succ of Odd.t\nend = Even\nand Odd : sig\n  type t = Succ of Even.t\nend = Odd\n\n\n\nWhoa! We’re seemingly defining some modules out of thin air! This looks very\nanalogous to the ill-founded definitions\n\nlet rec even : some_type_for_even = even\n  and odd : some_type_for_odd = odd\n\n\n\nBut since we’re only defining types here, this trick cannot cause undefined\nvalues to sneak into our program. We have effectively gotten OCaml to infer the\ndefinition of a module from its signature in the special case where the module\nonly contains type definitions (it may also contain module type definitions).\n\nMutual recursion is not required for this to work. You can also wrap everything\nup into a single recursively defined parent module if you like.\n\nmodule rec Layered : sig\n  module Even : sig\n    type t =\n    | Zero\n    | Succ of Layered.Odd.t\n  end\n  module Odd : sig\n    type t =\n    | Succ of Layered.Even.t\n  end\nend = Layered\n\n\n\nSadly, this trick is somewhat limited in that it doesn’t work with our\nType_conv pre-processors since there are only type specifications here and not\ntype definitions upon which to hang a “with sexp” (for example).\n",
        "url"      : "https://blog.janestreet.com/a-trick-recursive-modules-from-recursive-signatures/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "BOFs, Tutorials and Talks, oh my!",
        "date"     : "August 25, 2010",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 1,
        "content"  : "I’m on the program committee for CUFP this year, so I’m a bit biased, but I feel\nvery good about this year’s program. For the first time, CUFP will be broken up\ninto three parts:\n\n\n  CUFP Tutorials on Friday October 1st.\nThis is really the descendent of last year’s DEFUN workshop. The tutorials\nwere picked carefully, both for the interest of the topic and the quality of\nthe teacher.\n  CUFP Talks on Saturday October 2nd.\nHaving been involved for a few years now, I really think it’s an unusually\nstrong group of talks. I would be pretty happy if we had a schedule\npopulated with the best of the talks that we rejected, much less the ones\nthat we ended up accepting.\n  \n    CUFP BOFs on the evenings of Thursday and\nFriday (Sep 30th and Oct 1). I’m really looking forward to these. These BOFs\nare still being organized, so you should follow the link and see if you have\nideas to contribute. The BOFs should hopefully attract people from outside\nthe usual CUFP audience, and we’re hoping it will be a good way for FP\ndevelopers to get together, talk about issues important to the various and\nsundry FP communities, and really get some work done.\n\n    \n    So, if you’re interested, register\nhere. Note that CUFP\nis being run as part of ICFP and the family of related workshops, so you go\nthrough the same registration process.\n  \n\n\nSee you in Baltimore!\n",
        "url"      : "https://blog.janestreet.com/bofs-tutorials-and-talks-oh-my/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Effective ML video",
        "date"     : "August 21, 2010",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 0,
        "content"  : "A while back I mentioned that I’d given a guest lecture at\nclasses at Harvard and Northeastern, and that the Harvard class had been taped.\nI finally got and uploaded that video:\n\n\n  \n\n\nSadly, the code samples are in spots a little hard to see here. If there’s\nenough interest, I’ll post the code samples here.\n\nWhile I’m at it, here’s a repost of the “Caml Trading” talk I gave at CMU. This\nolder talk is more about Jane Street’s business model, and why we think OCaml is\na good fit for it, but there is some overlap between the talks.\n\n\n  \n\n\n\n\n*Apply for a job at Jane Street.\nYou have nothing to lose but your opportunity to program in Java. *\n",
        "url"      : "https://blog.janestreet.com/effective-ml-video/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "More expressive GADT encodings via first class modules",
        "date"     : "August 4, 2010",
        "authorId" : "nlinger",
        "author"   : "Nathan Linger",
        "tags"     : [],
        "minsToRead" : 14,
        "content"  : "GADTs allow one to statically enforce stronger program invariants than are\notherwise possible in a Hindley-Milner style type system. This post retells the\nstory of how to “roll your own” GADTs using an explicit type of equality\nconstraints. More interestingly, we discuss a particularly versatile definition\nof type equality in Haskell that can now be transcribed into OCaml due to the\nrecent addition of first class modules.\n\nGADTs\n\nThe acronym GADT stands for Generalized Algebraic DataType. One definition of a\nGADT is a parameterized algebraic datatype in which type parameters may vary\nfrom one constructor to the next. Consider the following example (adapted from\nthe Haskell Wiki’s GADT page, which\nis also a good starting point for papers on this sort of thing):\n\ndata Term x where\n  Lit :: Int -&gt; Term Int\n  Pair :: Term a -&gt; Term b -&gt; Term (a, b)\n  Fst :: Term (a, b) -&gt; Term a\n  Snd :: Term (a, b) -&gt; Term b\n\n\n\nThis type is like a variant type in OCaml except that (1) the constructors’ type\nsignatures are given explicitly as opposed to the standard compact BNF\ngrammar-like syntax for algebraic datatypes, and (2) the constructors do not\nuniformly return Term x. Instead, the type parameter x in the return type\nvaries from constructor to constructor.\n\nIn this particular example, GADTs are used to ensure that well-typed values of\ntype Term a always represent values of type a. In other words, by embedding\nthe type system of the defined language (Term in this example) into the type\nsystem of the defining language (Haskell in this example), we are guaranteed\nthat type errors in the defined language are caught by the type checker of the\ndefining language. The following idealized interpreter session shows this\nbehavior.\n\nghci&gt; :t (\\x -&gt; Pair (Lit 5) (Fst x))\nforall a b. Term (a, b) -&gt; Term (Int, a)\nghci&gt; Fst (Lit 5)\nError: ... expected type \"(a, b)\" but inferred type \"Int\" ...\n\n\n\nPattern matching rules for GADTs are also stronger than those for vanilla\ndatatypes in that the type checker may discover additional constraints about a\ntype that hold in the context of a particular pattern matching branch. Consider\nthe following evaluator:\n\neval :: Term a -&gt; a\neval (Lit i) = i\neval (Pair x y) = (eval x, eval y)\neval (Fst xy) = x where (x, y) = eval xy\neval (Snd xy) = y where (x, y) = eval xy\n\n\n\nIn the second clause of the definition of eval, the type checker learns from\nthe type of the constructor Pair that a = (b, c) for some b and c. This\ninformation is used to justify returning a value of type (b, c) where a value\nof type a was expected.\n\nThe statically typed evaluator is a well-worn example of GADTs. A perhaps\nless-worn example is type-preserving rewrite rules.\n\nsimplify :: Term a -&gt; Term a\nsimplify (Fst (Pair x _)) = simplify x\nsimplify (Snd (Pair _ y)) = simplify y\nsimplify other = other\n\n\n\nghci&gt; eval (Fst (Pair (Lit 5) (Lit 8)))\n5\nghci&gt; simplify (Fst (Pair (Lit 5) (Lit 8)))\n(Lit 5)\n\n\n\nRegarding other applications of GADTs, a future post will recount how to write\nprograms exhibiting “intensional polymorphism” using a GADT of type\nrepresentations a la Stephanie Weirich and then show a nice way to represent a\nsignificant subset of OCaml types this way.\n\nPopular language extensions allow one to program directly with GADTs in Haskell\nas shown here. However, plain Haskell allows us to “roll our own” GADTs with a\nlittle extra work.\n\nEncoding GADTs with type equality\n\nThe above description of pattern matching with GADTs hints at a useful fact: the\nextra power afforded by GADTs has to do with tracking and applying additional\ntype equality constraints. What proper GADTs do for us automatically, we may do\nfor ourselves if we have a way to manipulate type equalities.\n\nTo this end, assume we have a type Equal a b that stands for the proposition\nthat a and b are equal types. Furthermore, assume that this type supports\nthe following operations.\n\nrefl :: Equal a a\nsymm :: Equal a b -&gt; Equal b a\ntrans :: Equal a b -&gt; Equal b c -&gt; Equal a c\nlift :: Equal a b -&gt; Equal (f a) (f b)\ncoerce :: Equal a b -&gt; a -&gt; b\n\n\n\nViewed through the propositions-as-types lens, the types of refl, symm, and\ntrans say that Equal is an equivalence relation, while the type of lift\nsays that every type constructor f preserves Equal.\n\nFinally the coerce function says what we can do with values of type\nEqual a b, namely coerce values from type a into an equivalent type b.\nSince a and b are equal, we expect such a coercion to behave like the\nidentity function. In particular, coerce should in no way inspect its second\nargument.\n\nThe above definition of Term may then be encoded in terms of Equal as\nfollows:\n\ndata Term x\n = Lit (Equal x Int) Int\n | forall a b. Pair (Equal x (a, b)) (Term a) (Term b)\n | forall b. Fst (Term (x, b))\n | forall a. Snd (Term (a, x))\n\n\n\nThe last three constructors here make use of existentially quantified type\nvariables (somewhat confusingly introduced with the keyword forall). This\ndefinition yields constructors with the following types\n\nLit :: Equal x Int -&gt; Int -&gt; Term x\nPair :: Equal x (a, b) -&gt; Term a -&gt; Term b -&gt; Term x\nFst :: Term (a, b) -&gt; Term a\nSnd :: Term (a, b) -&gt; Term b\n\n\n\nPartially applying Lit and Pair constructors to refl yields more constructor\nfunctions with more natural types.\n\nnum :: Int -&gt; Term Int\npair :: Term a -&gt; Term b -&gt; Term (a, b)\n\nnum = Lit refl\npair = Pair refl\n\n\n\nNote that these types are the same as those declared for constructors of the\nGADT version of Term.\n\nWhen pattern matching, we will find it useful to explicitly manipulate equality\nconstraints in the form of values of type Equal a b for particular types a\nand b. The eval example becomes\n\neval :: Term a -&gt; a\neval (Lit eq i) = coerce (symm eq) i\neval (Pair eq a b) = coerce (symm eq) (eval a, eval b)\neval (Fst a) = x where (x, y) = eval a\neval (Snd a) = y where (x, y) = eval a\n\n\n\nThe use of equality values here makes precise the informal reasoning given above\nfor why a GADT-aware type checker accepted the previous version of eval. The\nburden on the programmer is non-trivial, but we may be willing to pay the price\nin order to squeeze more assurances from the type system.\n\nUser-defined type equality\n\nHow might one implement Equal? Amazingly, there is a GADT-free definition for\nEqual involving polymorphism over type constructors rather than over types,\nnamely\n\ndata Equal a b = Coerce (forall f. f a -&gt; f b)\n\n\n\nAs a logical formula, this definition says that a equals b if every property\nf that holds of a also holds of b. The type variable f in this\ndefinition ranges over type constructors (like OCaml’s option and list)\nrather than types (like OCaml’s int and string).\n\nReflexivity and transitivity are easy to prove with this definition.\n\nrefl :: Equal a a\nrefl = Coerce id\n\ntrans :: Equal a b -&gt; Equal b c -&gt; Equal a c\ntrans (Coerce f) (Coerce g) = Coerce (g . f)\n\n\n\n(Note that the infix dot is Haskell’s function composition operator). Symmetry\nis more difficult, so we leave it for later. Coercion may be defined by\ninstantiating f to the identity type constructor.\n\nnewtype Id a = Id { unId :: a }\n\ncoerce :: Equal a b -&gt; a -&gt; b\ncoerce (Coerce f) = unId . f . Id\n\n\n\nThe idiom used in the definition of Id gives us both an injection function\nId :: a -&gt; Id a as well as a projection function unId :: Id a -&gt; a. The\nkeyword newtype means that this type definition is treated as a type synonym\nat runtime, so that Id and unId are both implemented as the identity\nfunction and therefore serve only to guide the typechecker. The definition of\nlift is a variation on this theme.\n\nnewtype Compose f1 f2 a = Compose { unCompose :: f1 (f2 a) }\n\nlift :: Equal a b -&gt; Equal (f a) (f b)\nlift (Coerce f) = Coerce (unCompose . f . Compose)\n\n\n\nThough the definition of Equal seems very asymmetric, one may define symm by\ninstantiating the argument coercion to the property f c = Equal c a, which\nmust hold of b since it holds of a (via refl).\n\nnewtype FlipEqual a c = Flip { unFlip : Equal c a }\n\nsymm :: Equal a b -&gt; Equal b a\nsymm (Coerce f) = (unFlip . f . Flip) refl\n\n\n\nTranscription into OCaml\n\nHow might we transcribe this implementation of Equal into OCaml? The most\ndifficult aspect seems to be the universal quantification over a type\nconstructor rather than over a regular type. Parameterization over a type\nconstructor is something that can be done in OCaml only via a functor.\nFortunately, OCaml 3.12’s new first class modules allow us to embed modules and\nfunctors into the value world.\n\nmodule type Equal = sig\n  type fst\n  type snd\n  module Coerce :\n    functor (F : sig type 'a t end) -&gt; sig\n      val f : fst F.t -&gt; snd F.t\n  end\nend\n\ntype ('a, 'b) equal = (module Equal with type fst = 'a and type snd = 'b)\n\n\n\nAfter making this initial step, I found transcribing the remaining Haskell\ndefinitions into OCaml is a useful exercise in learning to program with first\nclass modules. Following Stephen’s lead in an earlier post, I will give the\nreader a chance to work out the remaining definitions on his/her own before\nposting my solution.\n\nComparison with previous implementations\n\nDefining type equality in OCaml is not a new idea. Oleg Kiselyov has a clever\nimplementation\nthat only requires explaining away a single unreachable exception-raising\nexpression. Similarly, a simple pair-of-coercions implementation of type\nequality features in one of the examples of first class modules in the OCaml\n3.12 reference\nmanual.\nSo what value is added by importing the implementation used in pure Haskell?\n\nThe answer lies in the lift combinator, which is not supported by previous\nOCaml implementations of type equality. Without it, consider what one would have\nto do to coerce an 'a list into a 'b list given a value of type\n('a, 'b) Equal.t. It seems clear that one must essentially map an 'a to 'b\ncoercion across the list, therefore copying the spine. The implementation\npresented here, however, allows one to do the coercion without ever inspecting\nthe list.\n\nFor lists, this may not be so bad. But in general, the occurrence of the type\nparameter 'a may be arbitrarily deep in the definition of a type constructor,\nand the deeper the occurrence, the more costly the traversal.\n",
        "url"      : "https://blog.janestreet.com/more-expressive-gadt-encodings-via-first-class-modules/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "OCaml as a scripting language",
        "date"     : "July 23, 2010",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 2,
        "content"  : "There is a common perception that you should choose your type system based on\nthe scale of your project. If you’re writing a little program (i.e., a\nscript), you should use a dynamically typed language like Python or Ruby. If\nyou’re building a large complicated piece of software, you’re better off with a\nstatically typed language like Java or (for the adventurous) OCaml.\n\nI’ve always suspected that this is wrong; that static types improve your life\neven when writing small programs, although the advantage is clearly smaller. At\nleast, this should be true for statically typed languages like OCaml and Haskell\nwhose syntactic overhead is fairly low.  But I’ve never been able to\nfully convince myself of this, because there has always been a set of practical\nbarriers that prevented me from using my favorite language, OCaml, in this role.\nFor one thing, the standard OCaml distribution is very much a “batteries not\nincluded” affair. OCaml’s standard library is well implemented, but it is small\nand idiosyncratic. Our own internal standard library,\n\ncore, is much more complete, but we don’t want to check out and compile within\nour massive source tree every time we write a one-off script. And we don’t want\nto have to keep these scripts up-to-date as our libraries evolve. Languages like\nPython and Perl do a great job of providing a stable platform for little\nprograms.\n\nA few months ago, we addressed some of OCaml’s limitations in this regard by\nperiodically stamping out “ocaml scripting environments”: custom OCaml toplevels\nwith some of our standard libraries baked in. You can use these interpreters as\nyou’d use a python executable, by stuffing it into a #! declaration at the top\nof a script. And we don’t uninstall old versions when we roll new ones, so\nscripts based on an older install can continue working indefinitely without\nmodification.\n\nNow that we have this at our disposal, it’s confirmed my suspicion that typed\nscripting is a net win. We’ve started using this for more and more, for\neverything from auto-generation of config files to nagios checks to scripts for\nsending emails.\n\nMy conclusion is that OCaml with the right libraries can be very lightweight,\nand that the type system is a big time-saver. When I create a new ocaml script,\nthe type errors that the compiler detects are almost universally real bugs, and\nas such reduce the amount of time it takes me to go from nothing to a working\nscript.\n\nAnd to boot, when our first version is in OCaml rather than Bash or Ruby, it’s\neasier (for us, anyway) to scale that script up to a real program if the need\narises.\n\n\n\nAre you a sysadmin who’s not afraid of closures? Then break out of your shell,\nand apply for a job at Jane\nStreet.\n",
        "url"      : "https://blog.janestreet.com/ocaml-as-a-scripting-language/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Ensuring that a function is polymorphic in Ocaml 3.12",
        "date"     : "July 16, 2010",
        "authorId" : "nlinger",
        "author"   : "Nathan Linger",
        "tags"     : [],
        "minsToRead" : 8,
        "content"  : "The beta version of Ocaml 3.12\nhas a couple of new features that relate to\na post Stephen wrote a while back\non how to ensure that a function definition is polymorphic. In this follow up\npost I will describe how one of those new mechanisms is essentially what you\nwant for this purpose and the other is perhaps not due to a subtle interaction\nwith how recursive definitions are type-checked.\n\nPolymorphic type annotations\n\nThe first new feature is simply the ability to directly annotate definitions\nwith polymorphic types. Consider\n\nlet const : 'a 'b. 'a -&gt; 'b -&gt; 'a =\n  fun x y -&gt; x\n\n\n\nThe function const is explicitly declared to have the polymorphic type\n'a -&gt; 'b -&gt; 'a.\n\nAs Stephen noted, this is not what is meant by free type variables in type\nannotations. For example, the following example with seemingly analogous type\nannotations (but a change in behavior)\n\nlet wrong_const (x : 'a) (y : 'b) : 'a = y\n\n\n\nis accepted, but assigned a less general type than const, namely,\n\nval wrong_const : 'a -&gt; 'a -&gt; 'a\n\n\n\nbecause the type variables in wrong_const indicate merely unspecified types\nwhereas the universally quantified type variables in const indicate arbitrary\ntypes. The former are used to ensure that multiple occurrences of the same type\nvariable must refer to the same type within a particular definition. The latter\nare used to ensure that a type is considered equal only to itself within a\nparticular definition.\n\nNote that a polymorphic type annotation holds inside the body of a recursive\ndefinition as well as outside, allowing what is known as polymorphic\nrecursion, where a recursive call is made at some non-trivial instantiation of\nthe polymorphic type.\n\ntype 'a perfect_tree = Leaf of 'a | Node of 'a * ('a * 'a) perfect_tree\n\nlet rec flatten : 'a. 'a perfect_tree -&gt; 'a list = function\n| Leaf x -&gt; [x]\n| Node (x, t) -&gt;\n  let pairs = flatten t in\n  let xs =\n   List.fold_right pairs ~init:[]\n    ~f:(fun (x1, x2) xs -&gt; x1 :: x2 :: xs)\n  in x :: xs\n\n\n\nThe recursion in the definition of flatten is polymorphic because the\nrecursive call to flatten is only well-typed if we instantiate its declared\npolymorphic type to ('a * 'a) perfect_tree -&gt; ('a * 'a) list.\n\nSeveral examples in Chris Okasaki’s classic book Purely Functional Data\nStructures\ninvolve so-called non-regular datatypes like perfect_tree. In order to\ndefine any useful functions on such types, one needs polymorphic recursion.\nUntil 3.12 one had to resort to various tricks involving recursively defined\nmodules or records in order to get Ocaml to accept a polymorphically recursive\nfunction definition. In 3.12 we can now express such definitions much more\ndirectly.\n\nExplicit type parameters\n\nThe second new feature in 3.12 relating to polymorphism is explicit type\nparameters. For non-recursive definitions, this feature may be used to\naccomplish the same thing as polymorphic type annotations.\n\nlet const' (type a) (type b) (x : a) (y : b) : a = x\n\n\n\nyields\n\nval const' : 'a -&gt; 'b -&gt; 'a\n\n\n\nIf we make a mistake in the definition\n\nlet wrong_const' (type a) (type b) (x : a) (y : b) : a = y\n\n\n\nthe type checker lets us know\n\nError: This expression has type b but an expression was expected of type a\n\n\n\nContrast this with the wrong_const example in the previous section.\n\nHowever, this simple understanding of explicit type parameters in terms of\npolymorphism is, sadly, not quite true when one considers recursive functions.\nFor instance, I was surprised to find that the following function type checks.\n\nlet rec f (type a) (x : a) : unit = f 42\n\n\n\nThe assigned type is\n\nval f : int -&gt; unit\n\n\n\nWhat is going on here is quite subtle. The typing rule for fun (type t) -&gt; E\nconsiders t abstract while typing E, but elsewhere it is considered to be\njust another unifiable type variable. The typing rule for let rec f = E deals\nwith two types for the recursively defined value,\n\n\n  the type bound to f while checking the body E (which is constrained\naccording to how f is used in E), and\n  the type inferred for its body E.\n\n\nThe last step after checking E is to unify these two types together and\ngeneralize the one resulting type to obtain the (possibly polymorphic) type of\nf used henceforth.\n\nIn the example above, the type inferred for the variable f is int -&gt; unit.\nThe body of the recursive definition is\n\nfun (type a) (x : a) -&gt; (f 42 : unit)\n\n\n\nThe type inferred for this expression is '_a -&gt; unit (the type variable is no\nlonger held abstract since it is outside the scope of the explicit type\nparameter). These two happly unify and the resulting type is int -&gt; unit.\n\nIt seems to me that a simple syntactic trick suffices to force the otherwise\nfaulty polymorphic interpretation of explicit type parameters to hold. Simply\npush the recursion into the scope of the type parameter by transforming\n\nlet rec f (type t) = E\n\n\n\ninto\n\nlet f (type t) = let rec f = E in f\n\n\n\nIn our case, we transform\n\nlet rec f (type a) = fun (x:a) -&gt; (f 42 : unit)\n\n\n\ninto\n\nlet f (type a) =\n  let rec f = fun (x:a) -&gt; (f 42 : unit) in\n  f\n\n\n\nwhich, as hoped, fails with the message\n\nError: This expression has type int but an expression was expected of type a\n\n\n\nConclusion\n\nThe upshot is that polymorphic type annotations are the preferred way to ensure\npolymorphism outright. They play well with recursive definitions, even going so\nfar as to support polymorphic recursion.\n\nExplicit type parameters, on the other hand, very nearly allow one to state\nrequirements about polymorphic aspects of a function, but this only works\nstraightforwardly for non-recursive definitions, and requires some small\nrewriting to work for recursive definitions. Even then, polymorphic recursion is\nbeyond the scope of this mechanism.\n\nTo be fair, it seems that explicit type parameters were not primarily intended\nto indicate polymorphism. Their chief purpose, rather, is to give one a way to\nrefer to the type parameters of a function when defining type components of\nfirst-class modules. The polymorphism intuitions are strong, however, and it\nseemed worthwhile to explore their limits.\n\n\n\nDo you want to work in a place that understands why functional programming\nmatters? Then join us.\n",
        "url"      : "https://blog.janestreet.com/ensuring-that-a-function-is-polymorphic-in-ocaml-3-12/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Making something out of nothing (or, why None is better than NaN and NULL)",
        "date"     : "July 15, 2010",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 14,
        "content"  : "Null is a pervasive concept in computing. Virtually all programming languages\nhave a way of expressing nothing, nullity, no answer. But handling nulls\ncorrectly turns out to be tricky, and many of the contexts in which you find\nnulls, you’ll also find confusing and error-prone semantics surrounding them.\n\nThe heart of the problem is that, in an attempt to make programming with null\neasier, nulls are often propagated implicitly through computations, allowing the\nprogrammer to write code that deals with nulls without explicitly contemplating\nhow nulls should be dealt with.\n\nMy experience has been that this is a mistake; that if you want robust,\neasy-to-reason-about code, the programmer must think explicitly about how to\nhandle null cases, and that programming languages would do well to provide\nprogrammers with good support for the requisite case analysis\n\nThe point can be illustrated by considering some of the contexts in which null\narises.\n\nIEEE Floating point arithmetic\n\nThis is an oft-forgotten case, but an important one. The null value in this case\nis NaN, which stands for not-a-number. NaN is the value you get when a basic\narithmetic operation has no reasonable answer, e.g. zero times infinity.\n\nIn the IEEE standard, NaN behaves quite differently from other floats. The\nstandard requires that applying the primitive arithmetic operations to NaN\nalways produces NaN. At first glance, this seems sensible. What’s 3 times\nI-don't-know? Clearly it’s I-don't-know!\n\nBut what about primitive operations that don’t return floats, like comparisons?\nAccording to the standard, comparison functions return false when one of their\narguments is NaN. Now this should make you nervous, since it’s not at all\nobvious that the value of I-don't-know &gt; 4 should be false.\n\nIndeed, this behavior violates a bunch of fundamental invariants. For example,\nit’s simply not the case that x &gt; y is equivalent to not (x &lt;= y) when x\nor y are NaN. Also, comparison functions don’t give a total order on floats\n– see what happens to your favorite binary tree when you throw in some NaNs.\nReflexivity doesn’t even hold, since NaN = NaN is false!\n\nTo see how weird the results of this can be, consider the following function for\ncomputing the max of two numbers (the example is in OCaml, but the same behavior\nwill show up in any language that provides IEEE compliant floating point\nsemantics):\n\nlet max x y =\n  if x &gt; y then x else y\n\n\n\nIf either x or y is NaN, the test will return false, and y will be\nchosen. Thus, max NaN 3 is 3, and max 3 NaN is NaN. Note that max\ndoesn’t have the null-propagation property that the IEEE standard wanted for\nprimitive float-producing operations. max doesn’t even have the symmetry\nproperty you would expect: max x y is not the same as max y x.\n\nHere’s another example. Consider the following two functions, which are intended\nto validate whether the provided float is between 1 and 10.\n\ntype result = Fail | Pass\n\nlet check1 x =\n  if x &lt; 1. || x &gt; 10. then Fail else Pass\n\nlet check2 x =\n  if x &gt;= 1. && x &lt;= 10. then Pass else Fail\n\n\n\nThe two functions certainly look equivalent, and if you’re not clued in to the\nstrange semantics of NaN, you might be surprised to discover that check1\nreturns Pass for NaN, whereas check2 returns Fail.\n\nSQL’s NULL\n\nMuch ink has been spilled about the problems associated with SQL’s NULL. The\nwikipedia page on NULL is a\npretty good summary, so instead of trying to give a complete picture here, I’ll\njust give a taste of how NULL works in SQL, and what the associated problems\nare.\n\nAt first glance, SQL’s NULL looks a lot like NaN. If you call a simple\nfunction and give it NULL as one of its arguments, the result will be NULL\nas well e.g., 3 + NULL is NULL. But unlike NaN, SQL’s NULL is not\nrestricted to floating point numbers; You can have a NULL in a column of\narbitrary type.\n\nLet’s see what happens when we start doing comparison functions in SQL. It turns\nout that SQL propagates a kind of NULL into boolean expressions as well. This\nseems more consistent than the behavior with NaN. Indeed, the odd way that\nNaN propagates into comparison functions is what sunk our implementation of\nmax, and so you might expect max implemented in SQL will work properly.\nLet’s try. Assume we have a table t which contains some NULLs:\n\n|    i |    j |\n|------|------|\n|    1 |    2 |\n| NULL |    2 |\n|    1 | NULL |\n\n\n\nIf we compute the max of these two columns, we should expect the result to be:\n\n|    i |    j | max  |\n|------|------|------|\n|    1 |    2 | 2    |\n| NULL |    2 | NULL |\n|    1 | NULL | NULL |\n\n\n\nHere’s some SQL code code for computing the max from two columns.\n\nSELECT\ni, j,\nCASE WHEN i &gt; j THEN i ELSE j END\nAS max\nFROM t\n\n\n\nSadly, we see the exact same bizarre behavior we got out of our floating-point\nmax function.\n\n|    i |    j |  max |\n|------|------|------|\n|    1 |    2 |    2 |\n| NULL |    1 |    1 |\n|    1 | NULL | NULL |\n\n\n\nWhy is this? Because SQL doesn’t hold its ground when it comes to the way it\nhandles NULL in conditional expressions. A NULL condition is treated as\nequivalent to false, where the more consistent behavior would be for the entire\nCASE expression to evaluate to NULL.\n\nThe root of the problem here is the attempt to pick a reasonable default\nbehavior in the presence of NULL in a way that doesn’t just scuttle the entire\ncomputation. Saving such a computation isn’t hopeless – these null-handling\nheuristics often produce reasonable answers. But they can produce unreasonable\nanswers as well, and the end result is that the behavior of SQL in the presence\nof NULL is inconsistent and confusing.\n\nThe above example is really just the tip of the iceberg. You can see examples of\nstrange behavior with aggregate functions, selects and joins as well.\n\nFundamentally, both SQL’s NULL and the floating-point NaN fail because the\nchoice of how to salvage a calculation that encounters null depends on things\nthat can not be known by the heuristic.\n\nOne good aspect of SQL’s NULL handling is that SQL provides ways of enforcing\nconstraints that given columns contain no NULLs. Given the odd behavior of\nNULLs, it’s an essential feature.\n\nNull references\n\nAnother common form of null is the null reference (or null pointer). Null\nreferences appear in most mainstream statically typed programming languages,\nincluding C, C++, C#, Java. In the following, I’ll talk about Java’s null\nreferences, but the same basic issues show up in many languages.\n\nUnlike IEEE floating point arithmetic and SQL, Java does not try to\nautomatically salvage computations that encounter nulls. Instead, any\ncomputation that tries to make a method call on a null reference will result in\nan exception. In some sense, exceptions are their own kind of null, a special\nvalue that a function can return when it can’t return anything else.\n(non-termination is yet another way for a computation to refrain from returning\nan ordinary value.)\n\nThe problem with null references is that while their handling is explicit at the\nlevel of values, it’s implicit at the level of types. Java’s type system allows\nfor all object references to potentially be null references, even though for\nmost variables most of the time, null will never show up. This is unlike SQL,\nwhere some values can be declared as non-NULL.\n\nBecause the type system gives you no clue as to the presence of nulls, null\nreference exceptions are rather ubiquitous in Java programs. Indeed, elaborate\nsystems like ESC Java (ESC stands for “Extended Static Checking”) have been\ndeveloped to enforce various static checks, prominent among them the elimination\nof runtime null reference exceptions.\n\nAs a side note, it’s interesting that Java has such a loosey-goosey approach to\nnull object references, when it has a fairly strict approach to exception\nhandling. Indeed, Java requires one to be quite strict and explicit when dealing\nwith so-called checked exceptions.\n\nML’s option type\n\nThe problem with Java’s null references is different from the problem with SQL\nand IEEE floating point arithmetic. In those two cases, the system tries to “do\nthe right thing” when null comes up, and that right thing turns out to sometimes\nbe wrong; the problem with Java is that it doesn’t provide sufficient tools in\nthe language to ease the programmer’s life in dealing with nulls.\n\nLanguages like ML and Haskell that are based on the Hindley-Milner type system,\non the other hand, provide quite powerful tools for null handling. Rather than\ntalk about such languages collectively, I’ll talk in terms of the instance I\nknow best, OCaml. In OCaml, null is not lurking in every type. If a variable has\ntype int array, then that variable is always populated with an int array,\nand never with null.\n\nWhen you need a variable whose value might or might not be there, it is modeled\nexplicitly in the type system. One common way of doing so is using what’s called\nan option. The option type is not a primitive of the language, but an\nordinary data type, which can be declared as follows:\n\ntype 'a option =\n  | None\n  | Some of 'a\n\n\n\nThis is a tagged union with two cases: None, indicating the lack of a value;\nand Some, indicating the presence of a specified value. To see how this might\nbe used in practice, consider the following simple hashtable interface:\n\nmodule Table : sig\n  type ('key,'data) t\n\n  val create : unit -&gt; ('key,'data) t\n  val find : ('key,'data) t -&gt; 'key -&gt; 'data option\n  val replace : ('key,'data) t -&gt; 'key -&gt; 'data -&gt; unit\nend\n\n\n\nNotice that find returns an optional value. The reason is that find may fail\nto find an entry corresponding to the given key. This fact is modeled in the\ntype by the fact that the return value is of type 'data option rather than\nsimply 'data.\n\nIn addition to allowing one to express in the type system that a value may be\nmissing, OCaml also provides powerful tools for doing the corresponding case\nanalysis. Here’s a simple example deals with options explicitly using match\nstatements.\n\nlet increment_count table key =\n  let current_count =\n    match Table.find table key with\n    | None -&gt; 0\n    | Some x -&gt; x\n  in\n  Table.replace table key (current_count + 1)\n\n\n\nHere we explicitly state that we want the null case (where the key can’t be\nfound in the table of counts) to be treated as if the table had an entry with\nvalue equal to zero.\n\nExplicit null handling like the kind shown above can be done in, say, SQL, using\nthe COALESCE statement. But in OCaml, the case analysis is obligatory – you\ncan’t silently use an optional value without contemplating what will happen in\nthe None case. Consider the following implementation of increment_count:\n\nlet increment_count table key =\n  Table.replace table key (Table.find table key + 1)\n\n\n\nWhile the analogous code would compile in SQL, the OCaml compiler will reject\nit, noting that the expression Table.find table key was used as if it were an\nint, but was in fact an int option. This may seem cumbersome, but in\nreality, the explicit separation of cases where null is and is not possible is a\ngreat relief, since it frees the programmer from worrying about the cases not\nflagged by the compiler.\n\nThe requirement to handle nulls explicitly doesn’t mean that we can’t reduce the\namount of boilerplate required. If we have a particular null-handling policy in\nmind, we can write helper functions to automate it. For example, we can write a\nfunction:\n\nlet option_get ~if_none option =\n  match option with\n  | Some x -&gt; x\n  | None -&gt; if_none\n\n\n\nThis allows us to rewrite our increment_count function as follows\n\nlet increment_count table key =\n  let current_count = option_get ~if_none:0 (Table.find table key) in\n  Table.replace table key (current_count + 1)\n\n\n\nIndeed, in Jane Street’s Base library there is an\nOption\nmodule devoted to useful helpful helper functions of this sort,\nincluding an option monad which is a quite general and powerful\ntechnique for dealing with option-generating computations.\n\nOCaml’s approach is far from perfect. While OCaml is quite explicit about the\nuse of options, it is quite the opposite when it comes to exceptions.\nExceptions, like null references in Java, lurk in every function, and this is a\nvery real problem. Indeed the inability to track exceptions in the type system\nhas lead us to try to avoid exceptions except for truly exceptional conditions\n(and to use options instead).\n\n\n\nIf you’re interested in working at a place where functional programming meets\nthe real world, you should think about applying to Jane\nStreet. We’re always looking to hire\ngreat programmers with an interest in functional programming.\n",
        "url"      : "https://blog.janestreet.com/making-something-out-of-nothing-or-why-none-is-better-than-nan-and-null/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Another use for private type abbreviations",
        "date"     : "April 27, 2010",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 3,
        "content"  : "Early in ‘09, I put up a post asking Private type abbreviations, what are they\ngood for?. I got a lot of\ngood answers to that question, but I thought I would mention one more: using\nprivate types for encoding subtyping relationships in phantom types. Below is a\nsimple example, which is based on an example from a\nprevious post. This example\nproposes three kinds of ref’s, to be separated by phantom types:\n\n\n  readwrite: a ref that can be both read from and written to.\n  readonly: a ref that can only be read from (but someone else might be able\nto modify)\n  immutable: a ref that can not be modified under any circumstances.\n\n\nObviously, the last case, immutable, is rather silly for a ref. But for a more\ncomplex datastructure (an array, for example), it makes perfect sense.\n\nThere is a natural sub-typing relationship here. Both immutable and\nreadwrite refs can be used anywhere where one needs a readonly ref. In the\nfollowing, we represent this subtyping relationship using private type\nabbreviations.\n\ntype readonly\ntype readwrite = private readonly\ntype immutable = private readonly\n\nmodule Ref : sig\n  type +'a t\n  val create : int -&gt; readwrite t\n  val create_imm : int -&gt; immutable t\n  val set : readwrite t -&gt; int -&gt; unit\n  val get : 'a t -&gt; int\nend\n =\nstruct\n  type 'a t = int ref\n  let create x = ref x\n  let create_imm x = ref x\n  let set x v = x := v\n  let get x = !x\nend\n\n\n\nNote that we need define no explicit coercion functions in the interface. One\ncan simply use the :&gt; syntax to do whatever coercions are required. i.e., one\ncan write:\n\nlet x = Ref.create 3\nlet y = (x :&gt; readonly Ref.t)\n\n\n\nNote that it’s important to declare the phantom parameter as covariant (which is\nwhat the + in the type definition is for), since otherwise you won’t be able\nto cast the phantom parameter.\n\nI’ve come to think of private type abbreviations as one of the better ways of\ndesigning a phantom type. My general design preference goes something like this:\n\n\n  If you can, use uninhabited types. They’re the simplest thing, because there\nare no type equalities, and all coercions are explicit in the interface.\n  If subtyping really helps make the interface more usable, use private type\nabbreviations on top of uninhabited types.\n  Finally, and only if sorely pressed, should you use polymorphic variants or\nobject types. These are harder to understand, but also the most expressive\nchoice.\n\n",
        "url"      : "https://blog.janestreet.com/another-use-for-private-type-abbreviations/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Effective ML",
        "date"     : "April 22, 2010",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 1,
        "content"  : "A couple of weeks ago I visited Northeastern and Harvard where I gave guest\nlectures on the subject of programming effectively in ML. In both cases, I was\ntalking to a class full of undergraduates who had been studying ML for the\nsemester. It was great fun, and in the Harvard case, the lecture was taped, so I\nhope to eventually be able to post a link to it here.\n\nThe lecture I gave was in part inspired by a book I read years ago called\nEffective Java, by Josh Bloch. I\npicked up the book when I first came to Jane Street when I was looking for\nsomething I could recomment to the people who were thinking about how to build\ntrading apps using C# and Java. Effective Java is organized as a series of\nshort lessons (or “items”, in the terminology of the book), well chosen, well\ndescribed, and with clear examples.\n\nIt’s not hard to see why a functional programmer would like Bloch’s book.\nConsider the following items:\n\n\n  Item 15: Minimize mutability\n  Item 16: Favor composition over inheritance\n  Item 22: Use function objects to represent strategies\n\n\nThese are all messages that come naturally to a functional programmer.\n\nThe lecture I gave was organized around a set of small lessons of similar scope\nthat I’ve learned about programming in ML over the years. Here’s the list of\nbullets I used:\n\n\n  Favor readers over writers\n  Create uniform interfaces\n  Make illegal states unrepresentable\n  Code for exhaustiveness\n  Open few modules\n  Make common errors obvious\n  Avoid boilerplate\n  Avoid complex type-hackery\n  Don’t be puritanical about purity\n\n\nA lot of these are ideas that I’ve talked about in previous posts, and some of\nthem are obvious to any experienced ML programmer from the heading alone. A few\nare perhaps worth more elaboration, and I hope to discuss some of them in later\nblog posts.\n\nIn the meantime, I’d be interested in hearing suggestions for other items to add\nto the list.\n",
        "url"      : "https://blog.janestreet.com/effective-ml/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "CUFP 2010 is coming!",
        "date"     : "April 22, 2010",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 1,
        "content"  : "CUFP is a yearly workshop for commercial users of functional programming. CUFP\nis aimed not just at industrial uses, but really at any uses of functional\nprogramming that are aimed at solving some pragmatic problem.\n\nThe workshop is co-located with ICFP, which is in Baltimore this year, and the\nscope of the workshop is bigger than usual. It spans two days rather than one,\nand in additional to the traditional talks, we will include a collection of\ninvited tutorials and some Birds-of-a-Feather sessions as part of the schedule.\n Right now, we’re actively soliciting proposals for talks. The talks\nthemselves are meant to fall into two formats:\n\nexperience reports, which are shorter (25 minute) talks recounting people’s\nexperience (successful or not!) using functional programming languages in a\npragmatic setting; and technical talks, longer (30-45 min) presentations\ncovering a technical technique or methodology, based on real-world experience\nwith FP.\n\nSo, if you have experience using a functional language in anger, consider\nsending in a proposal. You can email directly to me at yminsky at janestreet dot\ncom. You can find the call for presentations\nhere. The deadline is June 15th.\n\nAlso, check out the new CUFP website, which among other\nthings has videos of previous year’s presentations.\n",
        "url"      : "https://blog.janestreet.com/cufp-2010-is-coming/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Jane Street OCamldocs now available",
        "date"     : "November 11, 2009",
        "authorId" : "rdouglass",
        "author"   : "Ralph Douglass",
        "tags"     : [],
        "minsToRead" : 0,
        "content"  : "I’m pleased to announce that we now have ocamldoc generated documentation\navailable for Type-conv, Bin-prot, Sexplib, and Core. You can find them here:\n\nhttp://www.janestreet.com/ocaml/janestreet-ocamldocs/\n\nThe module paths in the documentation for Core and Core_extended are relative\nto Core.Std and Core_extended.Std.\n\nYou can also find a tarball here:\n\nhttp://www.janestreet.com/ocaml/janestreet-ocamldocs-2009-11-11.tgz\n",
        "url"      : "https://blog.janestreet.com/jane-street-ocamldocs-now-available/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Core Gems: Time",
        "date"     : "November 7, 2009",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 3,
        "content"  : "This post is meant to be the first in a series highlighting various interesting\nfeatures of Core (although I should acknowledge that most of the continuing\nseries I’ve started so far have not, when it comes down to it, continued.) This\ntime I wanted to focus on how Core handles time.  Time is a\nsurprisingly complex topic, as anyone who has worked through the gory details of\ncalendrical calculations can tell you. One of the initial complexities is that\nthere are lots of different-but-related concepts that come into play, and in\norder to design a good library, you have to figure out how to reflect those\nconcepts in your datatypes. Here are the primary types that\n\nCore uses for dealing with time:\n\n\n  Time.t, an absolute time, i.e., a fully specified point in time,\nindependent of time zone or any other information.\n  Time.Span.t, a length of time, as in “5 minutes” or “3 hours”.\n  Time.Ofday.t, a time of day, as in, “3:53:12 PM”.\n  Date.t, a date, e.g. “2008-12-13”.\n  Weekday.t, a day of the week, e.g., “Monday”\n  TZ.Zone.t, a timezone, e.g., “EST5EDT”. The combination of a date, a\ntime of day, and a timezone is sufficient to compute a Time.t\n\n\nInterestingly, Time.t, Time.Ofday.t and Time.Span.t share the same\nunderlying implementation: they are all floating point numbers representing a\nnumber of seconds. A Time.t is the same kind of float returned by the\ngettimeofday call in the standard of library, basically a traditional UNIX\ntime. A Time.Ofday.t is implemented with a float representing the number of\nseconds since the beginning of the day, and a Time.Span.t is represented by a\nfloat representing the number of seconds in question.\n\nBy seperating into three types rather than one, we get types that are more\ninformative and less error prone. For example, the function Time.diff and\nTime.add have the following signatures:\n\nval diff: Time.t -&gt; Time.t -&gt; Time.Span.t\nval add: Time.t -&gt; Time.Span.t -&gt; Time.t\n\n\n\nThis stops you from making basic mistakes, like taking two absolute times and\nadding them together and expecting to get another absolute time.\n\nCore’s handling of time has a lot going for it. There are many useful\nfunctions, and it’s been reasonably well battle tested (although there are\nsurely bugs yet to be found.) There is also the very useful TZ module, which\nis a reimplementation of UNIX’s timezone handling, which uses the standard UNIX\ntimezone database. (“Why reimplement?” you may ask. It turns out that the libc\ncalls for doing timezone conversion require you to specify the timezone by\nstuffing it into the TZ environment variable before making the call. That\nmakes these calls painful and error-prone in the presence of threads.)\n\nThe biggest remaining problem we have is that time zone handling is not\nintegrated into the Time module itself – Time can only convert strings in\nlocaltime and UTC. Integration of TZ into Time is something we hope to get\ndone in the next release or two.\n",
        "url"      : "https://blog.janestreet.com/core-gems-time/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Optimizing List.map",
        "date"     : "October 10, 2009",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 5,
        "content"  : "With the latest release of Core, I’ve had occasion to think about how our\nlibraries differ from INRIA’s. One difference that’s been there from the very\nbeginning is in the List module: INRIA’s list functions are not\ntail-recursive, and ours are.  The obvious reason to prefer\ntail-recursive solutions is that they are able to handle lists of arbitrary\nlength, but they have a downside as well. INRIA’s list implementation is not\ntail recursive largely because of performance, as described by Xavier\n\nhere. All that\nheap allocation you get by mapping and then reversing the list really costs.\n\nThe key tradeoff that Xavier points to is the choice between running fast on\nsmall-to-medium lists, and running at all on long lists. A few different people\nhave opined that lists don’t make sense for large datasets anyway, which argues\nin favor of the choice made in the standard library of using the tail-recursive\nversion. But I’ve never bought this argument. It’s hard to predict what your\ncode is going to be used for, and you don’t want to have landmines where your\ncode simply blows up (rather than degrading gracefully in performance) when run\non an unexpectedly large dataset. Using a non-tail-recursive list implementation\nleads to such brittle behavior.\n\nSo, it occurred to me, can we have the best of both worlds? As Xavier points\nout, there are “magic” implementation that you can do that use the dreaded\nObj.magic, but Obj.magic is notoriously difficult to reason about, and is\nreally the same thing as hacking the compiler. Among other things, it leaves you\nwith no guarantees when new versions of the compiler are released.\nExtLib takes this approach, but it’s\nnever been something we’ve been comfortable with.\n\nBut what if we write a version that detects when we’re dealing with a big list,\nand dynamically switches implementations? Here’s a simple version.\n\nopen Core.Std\n\nlet rec count_map ~f l ctr =\n  match l with\n  | [] -&gt; []\n  | hd :: tl -&gt; f hd ::\n    (if ctr &lt; 5000 then count_map ~f tl (ctr + 1)\n    else List.map ~f tl)\n\nlet map ~f l = count_map ~f l 0\n\n\n\nThis works a lot better. It’s a little bit slower than the standard List.map\nfor small lists, and about the same as the tail-recursive List.map for large\nlists. But we can do better still. There are two more optimizations I played\nwith. The first is to do a little loop unrolling on the recursion, and the\nsecond is to to deal with the large list case going through arrays, as suggested\nin a post by Christophe\nTroestler. Here’s\nthe resulting code:\n\nopen Core.Std\n\nlet list_array_map ~f l =\n  Array.to_list (Array.map ~f (Array.of_list l))\n\nlet rec count_map ~f l ctr =\n  match l with\n  | [] -&gt; []\n  | [x] -&gt; [f x]\n  | [x;y] -&gt; [f x; f y]\n  | [x;y;z] -&gt; [f x; f y; f z]\n  | x :: y :: z :: w :: tl -&gt;\n    f x :: f y :: f z :: f w ::\n      (if ctr &gt; 500 then list_array_map ~f tl\n      else count_map ~f tl (ctr + 1))\n\nlet map ~f l = count_map ~f l 0\n\n\n\nThis implementation does better still. It’s actually faster than the standard\nimplementation on short lists, and only a little slower on long lists. Here are\nsome very rough benchmarks (done on an x86-64 box). Here, the mean and standard\ndeviations are of the ratio of the implementation versus the implementation in\nthe standard library. “core” is the implementation currently in Core, and\n“fast” is the above implementation.\n\n## list length 0 ##\ncore: mean 1.648838, nstdev 0.043502\nfast: mean 0.717259, nstdev 0.043177\n## list length 1 ##\ncore: mean 2.113085, nstdev 0.075585\nfast: mean 0.596140, nstdev 0.049489\n## list length 2 ##\ncore: mean 1.989603, nstdev 0.044707\nfast: mean 0.636450, nstdev 0.003376\n## list length 5 ##\ncore: mean 2.003528, nstdev 0.043638\nfast: mean 0.821950, nstdev 0.024802\n## list length 10 ##\ncore: mean 1.428904, nstdev 0.016536\nfast: mean 0.729491, nstdev 0.018766\n## list length 20 ##\ncore: mean 1.443628, nstdev 0.062703\nfast: mean 0.743741, nstdev 0.018308\n## list length 100 ##\ncore: mean 1.301089, nstdev 0.019097\nfast: mean 0.898968, nstdev 0.017839\n## list length 1000 ##\ncore: mean 1.719725, nstdev 0.025758\nfast: mean 0.950799, nstdev 0.018624\n## list length 25000 ##\ncore: mean 1.721275, nstdev 0.044541\nfast: mean 1.188690, nstdev 0.031437\n\n\n\nThe performance improvement seems worth the ugly code given how common map is. I\nsuspect some hack like this will make it’s way to a future version of Core.\n",
        "url"      : "https://blog.janestreet.com/optimizing-list-map/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Core 0.6.0 release",
        "date"     : "October 8, 2009",
        "authorId" : "rdouglass",
        "author"   : "Ralph Douglass",
        "tags"     : [],
        "minsToRead" : 1,
        "content"  : "We are proud to announce the second major release of Core, Jane Street’s\nalternative to OCaml’s standard library. This release also includes\nCore_extended, which adds new functionality such as subcommand style command\nline argument handling, a procfs interface, readline support, and more.\nCore_extended is used heavily at Jane Street, but not systematically code\nreviewed in the same manner as Core. As was warned in the first release, the\ninterfaces to many modules have changed, so upgrade with care. Interfaces will\ncontinue to change with future releases.\n\nCore is intended to be used with OCaml 3.11.1. It will not compile with 3.10.\n\nWe have tested the code on Linux (Centos 5), but have only limited experience\nwith it on other platforms. It compiles on Mac OS 10.6, but has had almost no\ntesting on that platform, and hasn’t been tested at all on anything else.\n\nYou can find the library here:\n\nhttp://www.janestreet.com/ocaml\n\nalong with three other libraries that you will need to use along with it:\ntype-conv, sexplib, bin-prot, and fieldslib. These four libraries provide macros\nfor generating functions for serializing and deserializing types, and for\nfolding over records.\n\nIn addition, Core depends on Pcre and Res. Core_extended also depends on Pcre.\nYou can find these libraries at Markus’s website:\n\nhttp://www.ocaml.info/home/ocaml_sources.html\n\nIf you have any comments or patches, we’d love to hear about it.\n\npatches should be sent to opensource@janestcapital.com.\n\nAll of the released libraries are licensed under the LGPL-plus-linking-exception\nthat is used by the OCaml standard library.\n",
        "url"      : "https://blog.janestreet.com/core-0-6-0-release/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Another JSSP post",
        "date"     : "September 29, 2009",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 0,
        "content"  : "Just thought I should point out another\npost,\nthis one from Patai Gergely, summarizing events at the JSSP end-of-summer\nmeeting, this one including pictures!\n",
        "url"      : "https://blog.janestreet.com/another-jssp-post/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Jane Street Summer Project round-up",
        "date"     : "September 23, 2009",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 3,
        "content"  : "We just had the end-of-summer meeting for this year’s JSSP, and this is my\npersonal summary of the event. We expect to post more information in the next\nfew days, including videos of the talks and photos.\n\nFrom my point of view, the single most useful project is unquestionably\nocamlviz. Ocamlviz is a realtime\nprofiling tool for OCaml, and I was really impressed with the system’s polish.\nThe design is carefully thought out; it seems to be quite well implemented; the\nfront-end has a surprisingly usable UI; there’s a nice looking website for it,\nand good documentation to boot. It’s really a fantastic effort, and I expect\nwe’ll be taking it for a spin on some of our own OCaml projects.\n\nI was also very pleased with the work that was done on the Moby scheme compiler\nfor smartphones. Here, the primary target\nis kids learning to program rather than sophisticated developers. Shriram’s talk\nincluded a description of the very cool\nbootstrap program, which teaches middle-school\nkids how to build their own video games in a scheme environment, while sneakily\nteaching them to understand both algebra and functional programming on the side.\nApparently there are plans to use Moby in class assignments at Brown this\nupcoming semester. I’m looking forward to the point where I can try this stuff\nout with my own kids.\n\nThose two were my personal favorites, but there was lots to like in the other\nprojects as well. I don’t know a ton about 3D rendering so I’m not really much\nof a judge, but the results from the\nLambdaCube project looked\ngreat. The goal is to get to the point where one could build a real 3D game in\nHaskell, which seems a laudable goal. It’s not quite there yet, but it looks\nlike the work over the summer gave them a chance to make some real progress. I\nwas also impressed with the commitment of the guys working on it. This is not\ngoing to be a project that peters out as soon as the summer is over.\n\nThe work on a self-adjusting computational geometry library was also really\ncool. I don’t yet have a link for the source code, but the algorithmic results\nwere impressive. One nice bit is that they were able to come up with some\nincremental algorithms that were asymptotically better than previously known\napproaches. The last project was\nArchimedes, which is a system for 2-D\nplotting and visualization in OCaml. While progress was made over the summer, it\nsadly is not quite at the stage where there is a usable library there. But work\non that will continue, and I have real hope that we’ll end up with a good\nlibrary in the end.\n\nWe ended the day with a fun talk from Chris Okasaki, reminiscing about the story\nbehind his deservedly famous book Purely Functional Data\nStructures.\nI think my favorite anecodote from the talk is about how the book suddenly\njumped in popularity when it was Slashdotted. Apparently, the review of the book\non Slashdot permanently raised the size of Okasaki’s royalty checks, to his\nwife’s continuing astonishment.\n\nThere was a dinner that I sadly was not able to attend, but I hear was much fun.\nAll told, it was a great experience. I’m looking forward to next year!\n",
        "url"      : "https://blog.janestreet.com/jane-street-summer-project-round-up/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Designing a code-review tool, Part 2: Patches or Diffs",
        "date"     : "August 16, 2009",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 1,
        "content"  : "One of the key decisions to make when designing a code review system is choosing\nthe basic unit of code review. One approach common in many open-source projects\nis to be patch-centric, i.e. to make the reviewable unit a single patch. In\na patch-centric world, code review is complete when every patch that went into\nthe system has been read.\n\nWe instead decided to build tools that were diff-centric. In a diff-centric\nworld, reviews aren’t done on single patches, but instead between a pairs of\npoints in history. Code review is complete when a path of diffs has been\ncompleted, starting from a fully read revision, and ending in the revision to be\napproved.  Both patch-centric and diff-centric approaches are\nreasonable, but we settled on diff-centrism for a couple of reasons:\n\n\n  Merges complicate patch-based review. If you’re going to do code-review at\na patch level, one question you need to answer is: how do you review merges?\nIn Mercurial, for instance, merges are represented by changesets that have\ntwo parents, and it’s not quite clear how to read the resulting diffs – you\ncould read the diffs from both parents, but this is largely a recapitulation\nof the changesets being merged. One might be tempted to simply not review\nmerge nodes, since they typically don’t contain material changes. But merges\naren’t always trivial, so you ignore them at your peril.\n  Reading many small patches reduces the signal-to-noise ratio. People make\ncoding mistakes. If you read every little patch that someone has committed,\nyou will end up reading the outcome of decisions that were later reversed.\nDiff-centric code review allows you to skip over some of that noise. If a\npatch is committed that must later be backed-out, then someone who is\nreading patches will end up reading both. In a diff-centric world, a\nreviewer who reads a diff that crosses over both the original patch and the\nbackout won’t have to review either.\n\n",
        "url"      : "https://blog.janestreet.com/designing-a-code-review-tool-part-2-patches-or-diffs/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Designing a code-review tool, Part 1",
        "date"     : "August 8, 2009",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 2,
        "content"  : "We’ve just rolled out a new software tool for managing our code review process.\nCode review is a pretty central part of how we try to maintain a high level of\nquality and safety for our critical software systems, and so a code review\nmanagement tool is an essential and long-overdue piece of infrastructure for us.\n\nThe new system is meant to facilitiate the basic code review process we’ve been\nusing, and at the same time make it more flexible and scalable. Before this\ntool, our approach to reading code was pretty simple. We had a small set of\nreviewers who were responsible for reading every line of a handful of\nrisk-critical systems. If a given codebase had never been reviewed before, then\nthe code would be read from scratch. If what was being reviewed was a new\nrelease, then instead of reading from scratch, everyone would read diffs from\nthe last reviewed checkpoint.\n\nAll of this was kept track of with simple manually updated log files. The only\ntool we really had was a program for generating PDF diffs. This system worked\nreasonably well when we were small, but it didn’t scale as the amount of code\nreview and the number of people involved grew.\n\nI’ll talk in a later post about the design we ended up settling on, but for now,\nI’ll just go over some of what we wanted to achieve with our new tool.\n\n\n  Lightweight: We had adopted a practice of doing big-bang reviews. We’d\nwork on a system for a while, and then when we were basically finished, we’d\ninitiated a round of code review. More recently, it’s become clearer to us\nthat more frequent but smaller rounds of code review are perferable, for a\nvariety of reasons. The tools need to make catching up on code review\nlightweight enough to be something that people would be willing to do every\nfew days.\n  Granular: Our initial approach, which made sense when the codebase was\nsmall, was to have a handful of people review the entirety of the relevant\nsystems. As our systems have grown larger, we’ve needed to switch to an\napproach where many different people are involved, with different people\nassigned to different subsystems. The code review tools need to support\ngranular assignment of code review without making the interface too complex\nto use.\n  Hackable: We wanted the resulting system to store its data in a way that\nwas easy to hack, when necessary. We knew that there would be times where we\nwanted to edit history to fix some obscure problem or other, and we also\nwanted to be able to maintain a clear and complete history of what we’ve\ndone.\n\n\nNext time, I’ll talk a bit more about where these requirements led us.\n",
        "url"      : "https://blog.janestreet.com/designing-a-code-review-tool-part-1/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "OCaml in Japan, and its Meeting in Tokyo",
        "date"     : "July 13, 2009",
        "authorId" : "jfuruse",
        "author"   : "Jun Furuse",
        "tags"     : [],
        "minsToRead" : 1,
        "content"  : "It might be surprising to hear that there are a significant number of OCaml\nusers in Japan, but it is true. OCaml has been used in programming courses of\nseveral major Japanese universities for several years. There are\nthree\npublished\nbooks\nabout OCaml in Japanese, and you can easily get them at large book stores.\n\nSome of us enjoy discussing OCaml programming in our blogs and net chats every\nday … mainly in Japanese. Probably that is why you might never have heard\nabout the far east OCaml riders.\n\nActually, apart from the handful OCaml programmers working in Jane Street Tokyo,\nwe ourselves have no clear idea how many we are. Far away from all the\ninteresting OCaml events held\nso\nfar, we have had no\nchance to meet together.\n\nOk, then, let’s have our own meeting, in Japan. We have planned our local\nmeeting, OCaml Meeting 2009 in Tokyo on 8/30, and announced it recently. Jane\nStreet has accepted to be one of our sponsors. European meeting organizers has\nkindly permitted use the same name for our meeting.\n\n\n  http://ocaml.jp/?Users%20Meeting (Japanese site)\n  http://wiki.cocan.org/events/ (English site)\n\n\nEven though the program is not yet announced completely, we have already 80\npossible participants, which forced us to stop the CFP currently. Yes, I am\nsorry but it may be too late for you to participate. But we plan recording talks\nand make them public, and if possible make them available live. Probably a nice\noccasion to see something familiar to you spoken in a strange Asian langauge!\n",
        "url"      : "https://blog.janestreet.com/ocaml-in-japan-and-its-meeting-in-tokyo/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "What do Haskellers have against exhaustiveness?",
        "date"     : "May 16, 2009",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 3,
        "content"  : "One of my favorite features of the Hindley-Milner type system is the built-in\nexhaustiveness checking that is applied to pattern matches. I like this feature\nenough that it is the focus of the only worked-out example I give in my\ntalk about OCaml at Jane Street.\n\nWhy is exhaustivenss checking so important? You can slog through my talk if you\nwant to hear a fuller account, but the basic point is that case analysis is one\nof the most common things one does in any program, and anything that can\nstatically check important properties of your case analysis is really helpful.\nMoreover, exhaustiveness checking serves as a kind of refactoring tool. Whenever\nyou expand the possibilities in a type, the compiler will point you to the\nplaces where you forgot to handle the new cases you just created. It is, from\nthe perspective of someone who has programmed in ML for a living for some years\nnow, an enormously useful feature.\n\nSo I was quite taken aback when one of our interns here (who will remain\nnameless, in case I’ve horribly misconstrued his words) pointed out to me that\nHaskell by default doesn’t even warn the programmer about inexhaustive\npattern-matches. In particular, if you save the following code into a file\ncalled ‘foo.hs’:\n\nfoo (Just x) = x + 1\nmain = print (foo Nothing)\n\n\n\nand the compile it, the compiler won’t make a peep. But when you run it, you\nwill of course get a runtime error:\n\nbash-3.2$ ghc -o haskell-foo foo.hs\nbash-3.2$ ./haskell-foo\nhaskell-foo: foo.hs:1:0-19: Non-exhaustive patterns in function foo\n\n\n\nIf you try something similar in OCaml:\n\nlet foo (Some x) = x + 1\nlet () = Printf.printf \"%d\\n\" (foo None)\n\n\n\nyou’ll get an earful from the compiler when you try to build it:\n\nbash-3.2$ ocamlc -o ocaml-foo foo.ml\nFile \"foo.ml\", line 1, characters 8-24:\nWarning P: this pattern-matching is not exhaustive.\nHere is an example of a value that is not matched:\nNone\n\n\n\nWe go even farther at Jane Street, where we use compiler flags to turn that\nwarning into an error, so that the code can’t build when the match is\nincomplete. When people want an incomplete match, they need to do it explicitly\nby adding a catch-all case, and to throw an explicit exception, as follows:\n\nlet foo = function Some x -&gt; x + 1 | None -&gt; failwith \"Argh!\"\nlet () = Printf.printf \"%d\\n\" (foo None)\n\n\n\nIt is worth noting that all is not wine and roses in the ML world. Many, maybe\nmost, ML programmers basically ignore the compiler’s warnings about inexhaustive\nmatches, which makes the warning the compiler does give you kind of useless.\nStill, the lack of even an admonition from ghc surprises me.\n\nPresumably Haskell programmers are just as concerned with getting static\nguarantees as ML programmers are. So I’m wondering why the difference in the\ndefault compiler behavior. I’d be interested in hearing from any Haskell\nprogrammers who can explain what’s going on.\n",
        "url"      : "https://blog.janestreet.com/what-do-haskellers-have-against-exhaustiveness/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Presenting the 2009 JSSP projects",
        "date"     : "May 12, 2009",
        "authorId" : "summerproject",
        "author"   : "Summer Project",
        "tags"     : [],
        "minsToRead" : 4,
        "content"  : "This year’s JSSP projects have been selected. We think it’s an exciting list of\nprojects, and we’re pleased that this year the projects support a number of\ndifferent programming language communities outside of OCaml. Here’s the list,\nalong with abstracts:\n\nOCaml: 2D Plotting Library\n\n\n  Student: Bertrand Desmons\n  Advisor: Christoph Troestler\n\n\nThe purpose of this project is to provide a high quality, platform-independent,\nand extensible 2D plotting library for OCaml. We aim to be attractive to\nstudents (no dependency apart from OCaml if desired), to professionals with\npublication quality needs (LaTeX output), and to researchers using visualization\nto help them understand data and/or develop algorithms (interactive backend).\n\nOCaml: Ocamlviz – Realtime Execution Profiling\n\n\n  Students: Julien Robert and Guillaume Von Tokarski\n  Advisors: Sylvain Conchon, Jean-Christophe Filliatre and Fabrice Le\nFessant\n\n\nThe goal of this project is to design and to implement a graphical tool for\nreal-time profiling of Objective Caml programs. Currently, profiling tools for\nOcaml are limited to post-execution analysis of call graphs. There is no\nconvenient way to display, during the execution of a program, how memory is\nused, how many times a function is called, or how the value of a global variable\nevolves. Ocamlviz will fill this gap.\n\nScheme: Scheme for smartphones\n\n\n  Students: Chris King and Danny Yoo\n  Advisors: Kathi Fisler and Shriram Krishnamurthi\n\n\nSmartphones are an exciting new computational platform. Scripts on these phones\nmust be reactive event processors able to sense and actuate. The PLT Scheme\nWorld library, which is designed for purely functional reactive programming, can\noffer developers a concise and elegant way of programming these platforms. We\npropose to design and implement a compiler from purely functional DrScheme\nprograms to Java programs that run on popular cell phones. The DrScheme\ncommunity will use this infrastructure to support its teaching program\n(distributed, reactive functional programming to audiences ranging from\nmiddle-schoolers to freshmen) and also for future research projects on mobile\ncomputing with functional DrScheme.\n\nHaskell: Lambda Cube 3D Engine\n\n\n  Student: Csaba Hruska\n  Advisor: Gergely Patai\n\n\nThis project aims to be the first general purpose 3D rendering engine written in\na pure functional language. There is no graphics library available for Haskell\nthat would be suitable as a basis for a complex graphical program. My Haskell\nrendering engine (called Lambda Cube Engine) uses the same model and material\nformat as Ogre3D (http://www.ogre3d.org). This choice is motivated by the fact\nthat Ogre3D has well-designed mesh model and material formats, and it also\nprovides exporter plugins for nearly every significant 3D modeling software\n(3DS-Max, Maya, XSI, Blender etc.). This design decision lets us reuse existing\n3D content and Ogre3D exporter plugins with ease. My knowledge of the Ogre3D\narchitecture will help in making correct design decisions during development.\n\nSML: Self-adjusting computational geometry library\n\n\n  Student: Duru Turkoglu\n  Advistor: Umut Acar\n\n\nFunctional languages such as SML are excellent for implementing complex\nalgorithmic software, because they provide clean interfaces to high-level data\nstructures (e.g., lists, vectors) and enable developing algorithms and data\nstructures in a composable fashion by offering a strong type system. In this\nproject, we aim to broaden the appeal of functional languages by making them an\nattractive language for practicing engineers, financial analysts, scientist, as\nwell as researchers and students that develop interactive software systems.\n\nMore specifically, we consider a recent extension of the SML language called,\nSaSML (Self-Adjusting SML), that enables developing self-adjusting programs that\ncan efficiently and automatically interact with changing data, and develop a set\nof libraries and applications in this language. The libraries that we develop\ntarget computational geometry and motion simulation, which form the basis of\nmany applications including in graphics, scientific computing and engineering,\nand finance.\n\nThis project builds directly on our previous work on computational geometry and\nmotion simulation, where we developed various research prototypes. This previous\nwork will be the starting point for our project, making it highly likely to\nachieve completion within a summer. We will need to port these libraries to the\nSaSML language and implement some new applications, notably a self-adjusting\nmeshing application. We expect that this work can be a starting point for\nfurther research into improved financial analysis techniques that rely on\nmeshing such as Black-Scholes-PDE based models or other applications that may be\nof interest to Jane Street Capital.\n",
        "url"      : "https://blog.janestreet.com/presenting-the-2009-jssp-projects/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Proposals in, application period is closed",
        "date"     : "April 1, 2009",
        "authorId" : "summerproject",
        "author"   : "Summer Project",
        "tags"     : [],
        "minsToRead" : 0,
        "content"  : "The application period for this year’s summer project is now closed, and we have\nan interesting collection of proposals to choose between. The proposals use a\nnumber of languages (in particular, Scheme, SML, F#, Haskell and OCaml),\nlooking to address a wide variety of different problem types.\n\nWe’ll get out acknowledgments to everyone who sent in a proposal within the next\ncouple of days.\n",
        "url"      : "https://blog.janestreet.com/proposals-in-application-period-is-closed/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Caml Trading talk at CMU",
        "date"     : "March 12, 2009",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 0,
        "content"  : "I was at CMU several weeks ago, and gave a version of my “Caml Trading” talk\nthere. See below if you are interested in seeing the video. It’s a reasonably\ngood source if you’re interested in understanding more about how and why Jane\nStreet uses OCaml.\n\n\n  \n\n\n",
        "url"      : "https://blog.janestreet.com/caml-trading-talk-at-cmu/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Do you use FP as a means rather than an end?",
        "date"     : "March 4, 2009",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 0,
        "content"  : "If you do, you might want to consider submitting a proposal to the 2009\nCUFP (Commerical Users of Functional\nProgramming) workshop.\n\nCUFP brings together people from different industries who use different\nlanguages, where the common thread is the use of FP in a practical setting. And\nit’s been a quite vibrant event, attracting many interesting talks, and growing\nquite quickly, going from 25 registered participants in ‘04 to over 100 in ‘08.\n\nIt’s worth noting that CUFP isn’t just about commercial use, despite the name.\nIt’s meant for anyone who is using functional programming as a means rather than\nan end.\n\n\n",
        "url"      : "https://blog.janestreet.com/do-you-use-fp-as-a-means-rather-than-an-end/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Private type abbreviations, what are they good for?",
        "date"     : "February 12, 2009",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 3,
        "content"  : "I’m having a lot of trouble figuring out what private type abbreviations are\ngood for. Private type abbreviations arrived as a new feature in OCaml 3.11, but\nI still don’t know where I would want to use them.  Here’s a simple\nexample of a private type abbreviation. First, let’s write down a trivial module\nthat has an int type and a couple of trivial helper functions:\n\nmodule Int = struct\n  type t = int\n  let of_int x = x\n  let to_int x = x\nend\n\n\n\nThat’s not very interesting on its own, but we can apply a signature to make t\nprivate:\n\nmodule Priv : sig =\n  type t = private int\n  val of_int : int -&gt; t\n  val to_int : t -&gt; int\nend = Int\n\n\n\nAnd you can do something similar to make t abstract.\n\nmodule Abstr : sig =\n  type t\n  val of_int : int -&gt; t\n  val to_int : t -&gt; int\nend = Int\n\n\n\nThe question is, when would one prefer Priv to Abstr?\n\nNow, I’m aware that there are some optimization differences. For example, the\nfollowing code uses the slower compare_val to compare its ints:\n\nAbstr.of_int 3 &gt; Abstr.of_int 4\n\n\n\nwhereas this code uses a more efficient specialized-to-int comparator.\n\nPriv.of_int 3 &gt; Priv.of_int 4\n\n\n\nBut the difference here is really a weird compiler optimization difference.\nThere’s no fundamental reason that the abstract version shouldn’t be able to do\nthe same optimization as the private one. Semantically, there’s nothing\ninteresting going on here.\n\nThere is one thing that I know how to do with a private type that I can’t do\nwith an abstract type: coercions. In particular, this code compiles:\n\nlet five =\n  let x = Priv.of_int 4 in\n  (x :&gt; int) + 1\n\n\n\nwhereas the same code with the abstract type fails. But that doesn’t seem all\nthat interesting, since I can write the following essentially equivalent code:\n\nlet five =\n  let x = Abstr.of_int 4 in\n  Abstr.to_int x + 1\n\n\n\nI had some other theories about what one could do with private types, but none\nof them seemed to pan out. For instance, the following all fail to compile:\n\nlet p = Priv.of_int 4\n\nlet _ = p + 1\n\nlet _ =\n  match p with\n  | 0 -&gt; true\n  | _ -&gt; false\n\nlet _ =\n  let f x = (x :&gt; int) + 1 in\n  f p\n\n\n\nSo is there some deeper purpose here that I’m missing?\n\nTo be clear, ordinary private types are clearly useful, since they let you have\naccess to pattern matching and record fields, while still giving you many of the\nadvantages of a private type. But private type abbreviations are still pretty\nmuch a mystery to me.\n",
        "url"      : "https://blog.janestreet.com/private-type-abbreviations-what-are-they-good-for/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Summer Project '09",
        "date"     : "January 31, 2009",
        "authorId" : "summerproject",
        "author"   : "Summer Project",
        "tags"     : [],
        "minsToRead" : 0,
        "content"  : "I am pleased to announce the Jane Street Summer Project for 2009! The goal of\nthe program is to encourage growth in the functional programming community. To\ndo that, we will fund students over the summer to work on open-source projects\naimed at making functional languages into more effective and practical tools for\nprogramming in the real world.\n\n\nWe’ve changed the name of the project from the OCaml Summer Project to the Jane\nStreet Summer Project to reflect a broader scope. While keeping a focus on\nOCaml, we are opening up the program to proposals supporting other functional\nprogramming languages. There are also some significant changes to the way in\nwhich the funding will work this year. You can find out more by reading the\nFAQ.\n",
        "url"      : "https://blog.janestreet.com/summer-project-09/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Lightweight versioning for lightweight protocols",
        "date"     : "December 3, 2008",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 2,
        "content"  : "At Jane Street, we often write OCaml programs that communicate over the network\nwith each other, and as such, we need to build lots of little protocols for\nthose programs to use. Macro systems like sexplib and binprot make the\ngeneration of such protocols simpler. The basic workflow is to create a module\nthat contains types corresponding to the messages in the protocol. Macros can\nthen be used to generate the serialization and deserialization functions. Just\nshare the protocol module between the different programs that need to\ncommunicate with each other, and –poof– you have a protocol. This is a\nhighly convenient idiom, and it makes it much easier to quickly throw together a\nnetworked application. But things get more complicated when you need to start\nchanging the protocol. In some cases, you can upgrade the entire system in one\nfell swoop. In that case, you can just modify your protocol, install the new\nsystem, and you’re off to the races.\n\nThe complicated (and more common) case is where you can’t afford to upgrade the\nentire system at once. Then you need to deal with version mismatches. The main\napproach we’ve taken to this problem is to make components support multiple\nversions of a given protocol at once. To do this, we keep around the modules\nthat describe old versions of the protocol, and keep explicit version numbers\nassociated with each protocol module. We then write conversion functions that\nallow for translation between different versions of the protocol, allowing one\nprogram to speak multiple versions of a protocol. When two different components\nneed to communicate, they first negotiate the version of the protocol they will\nspeak to each other, picking the largest version that they both support.\n\nThis approach works reasonably well, but it has some downsides. The translation\nfunctions are somewhat tedious to write, and therefore error-prone. And even\nthough the idea sounds simple, it’s hard to get the details right. We’ve had to\nplay around with a few different approaches to writing the conversion functions.\nOne approach is to write upgrade and downgrade functions from each version to\nand from its successor. You can then achieve any conversion by chaining the\nconversion functions together. Another approach we’ve tried is having another\nset of types that are an internal model of the communication protocol, and to\nhave conversion functions between each supported version and the model. Both\napproaches are workable, but each has its own advantages and disadvantages.\n\nThis is a problem that I’m sure many people have grappled with. I’ve just given\na quick overview of how we deal with it. I’d love to see other people’s comments\non how they’ve approached the same issues.\n",
        "url"      : "https://blog.janestreet.com/lightweight-versioning-for-lightweight-protocols/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "The OSP meeting",
        "date"     : "September 30, 2008",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 3,
        "content"  : "I’ve been meaning to write about the OCaml Summer Project end-of-summer meeting\nthat occurred on September 12th, but as those of you who read the news may have\nnoticed, it’s been a busy time in the financial markets. But I’ve gotten a\nmoment to catch my breath, so I thought I’d post my recollections. The meeting\nwas a great deal of fun. Participation was good, with students from every\nproject, and mentors from most of the projects. Matthias Felleisen was our\ninvited speaker, and he gave an excellent talk about the role of contracts and\nstatic typing in Scheme. Contracts themselves are a compelling idea, and one\nthat it seems like the ML community could learn from.\n\nMatthias was a particularly good match for the meeting because, coming from the\nscheme community, he has seen many of the issues that the OSP is trying to\naddress from a different perspective. Things like powerful macros, GUI toolkits,\ngood pedagogical support and IDE integration are things that various schemes\n(DrScheme in particular) have had in one form or another for quite a while.\nSimilarly there are developments in the Scheme world, notably the addition of\nstatic typing, which are old hat in ML.\n\nBut the best part of the meeting was seeing how the projects had progressed and\nmeeting the people who had worked on them. Some of the projects have resulted in\nsoftware that you can download and use right now, such as\nocamlwizard, menhir\nenhancements and delimited\noverloading. It seems like the ocamlwizard\nwork, along with the related project\nOCamlSpotter (by Jane Street’s\nown Jun Furuse), are leading INRIA towards extending the .annot files\ncurrently generated by the compiler to generally ease integration with editors\nand IDEs. The students working on Menhir added some much needed functionality to\nan already existing and capable tool. And the delimited overloading project has\nadded a really lovely syntax extension that significantly improves the\nreadability of numerical code in OCaml. These projects are all in a state where\nyou can actually grab them and use them today, which is a big part of what we\nwere pushing for with this year’s OSP.\n\nThe qtcaml project and the project to add a\nconcurrent GC to OCaml are steps along the way to more ambitious projects. The\nqtcaml guys worked on tools for auto-generating OCaml bindings from C++ code,\nwhich is a first step towards building a complete and up-to-date wrapping for\nthe large and ever-changing Qt toolkit. Their expectation is that the binding\ngenerator should be useful on its own. The parallel GC project has made a first\nproof-of-concept of a parallel GC in OCaml. It’s an exciting step, but it’s not\nclear to me where it goes from here, although we do expect to see a release of\ntheir current prototype. The performance of the prototype isn’t good enough to\nmake it worth using in practice, and it’s not clear how the work that has been\ndone could be migrated to the mainstream compiler.\n\nThe only other project is EasyOCaml. The schedule of that project was out of\nsync with the rest of the projects, and so we haven’t yet seen the final\nresults. Their goal is an important one: to make OCaml a better pedagogical\nlanguage. I have high hopes of this work getting integrated into the DrOCaml IDE\nand making an effective teaching platform for OCaml.\n\nOur real hope out of all this is that these projects continue to live on and\nthrive. As Matthias mentioned in his talk, people should take the summer\nprojects as seed money, helping start out something that lives on long after.\n",
        "url"      : "https://blog.janestreet.com/the-osp-meeting/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "A Working Programmer's Guide to Type-Indexed Values",
        "date"     : "September 23, 2008",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 10,
        "content"  : "Parametric polymorphism is a basic mechanism in ML for writing code that is\ngeneric, i.e., that can be used on multiple different types. To get the basic\nidea of how what parametric polymorphism is, think about the following simple\nexample.\n\nmodule M : sig\n  (* Takes a list and returns a stuttered version, e.g., [1;2;3] is mapped to [1;1;2;2;3;3] *)\n  val double : 'a list -&gt; 'a list\nend = struct\n  let rec double = function\n  | [] -&gt; []\n  | hd :: tl -&gt; hd :: hd :: double tl\nend\n\n\n\nIn the type signature for double, the expression 'a is a type variable,\nmeaning that this function can be used with an arbitrary type plugged in for\n'a. The reason that the type variable shows up is that the code of double\ndoesn’t depend in any way on the properties of the elements of the list.\n At first glance, parametric polymorphism doesn’t seem all that\npowerful. After all, how useful is it to write functions that can only be\ngeneralized over all possible types? More often than not you want to write\nfunctions that can be used over some narrow universe of types with particular\nproperties. Object oriented languages provide this functionality with subtyping,\nand Haskell lets you get at this with type-classes. What do you do in ML?\n\nIt turns out that ML does allow you to write functions like this in a quite\nstraightforward and ordinary way. A simple example can be found be examining the\nsignature of a standard sort function:\n\nval sort : cmp : ('a -&gt; 'a -&gt; int) -&gt; 'a list -&gt; 'a list\n\n\n\nThe signature above can be used on a value of any type, provided that you can\nalso provide a comparison function for that type. In other words, a polymorphic\nfunction can take advantage of the idiosyncratic capabilities of the types it\ndeals with, but the ability to take advantage of those capabilities must be\npassed in along with the values in question.\n\nThis is used in small ways throughout any non-trivial ML codebase. But we can\nuse this in a more structured way by creating what are sometimes called\ntype-indexed values. A type-indexed value is a value used to represent a set\nof capabilities associated with a type. Here’s an example of a simple\ntype-indexed value for capturing number-ness. In what follows, the type-indexed\nvalue is Num.Type.t, and the rest of the Num module is just utility\nfunctions to make the interface pretty.\n\nmodule Num : sig\n  module Type : sig\n    type 'a t\n    val int : int t\n    val float : float t\n  end\n\n  val (+) : 'a Type.t -&gt; 'a -&gt; 'a -&gt; 'a\n  val (-) : 'a Type.t -&gt; 'a -&gt; 'a -&gt; 'a\n  val ( * ) : 'a Type.t -&gt; 'a -&gt; 'a -&gt; 'a\n  val neg : 'a Type.t -&gt; 'a -&gt; 'a\n  val zero : 'a Type.t -&gt; 'a\n  val sum : 'a Type.t -&gt; 'a list -&gt; 'a\n  val sum_product 'a Type.t -&gt; 'a list -&gt; 'a list -&gt; 'a\nend = struct\n  module Type = struct\n    module T = struct\n      type 'a t = {\n        plus : 'a -&gt; 'a -&gt; 'a;\n        mul : 'a -&gt; 'a -&gt; 'a;\n        neg : 'a -&gt; 'a;\n        zero : 'a;\n      }\n    end\n    open T\n\n    let int = { plus = Int.(+);\n                neg = Int.(-);\n                zero = Int.zero;\n                mul = Int.mul; }\n\n    let float = { plus = Float.(+);\n                  neg = Float.(-);\n                  zero = Float.zero;\n                  mul = Float.mul; }\n\n\n  end\n  open Type.T\n\n\n  let (+) typ x y = typ.plus x y\n  let neg typ x = typ.neg x\n  let zero typ = typ.zero\n  let ( * ) typ x y = typ.mul x y\n\n  (* Some derived operations *)\n  let (-) typ x y = typ.plus x (typ.neg y)\n  let sum typ l = List.fold_left ~init:typ.zero ~f:typ.plus l\n  let sum_product typ l1 l2 = sum typ (List.map2 ~f:typ.mul l1 l2)\nend\n\n\n\nYou’ll note that the definition above of Type.int and Type.float are\nbasically boilerplate. Because the modules in question themselves have a fairly\nstandardized interface, we could instead use a functor to create these\ntype-indexed values without the boilerplate:\n\nmodule type Arith = sig\n  type t\n  val (+) : t -&gt; t -&gt; t\n  val neg : t -&gt; t\n  val zero : t\nend\nmodule Build_type(M:Arith) = struct\n  let typ x = { Type.\n    plus = M.(+);\n    neg = M.(-);\n    zero = M.zero;\n  }\nend\n\nlet int = let module Z = Build_type(Int) in Z.typ\nlet int64 = let module Z = Build_type(Int64) in Z.typ\nlet int32 = let module Z = Build_type(Int32) in Z.typ\nlet native = let module Z = Build_type(Native_int) in Z.typ\nlet float = let module Z = Build_type(Float) in Z.typ\nlet complex = let module Z = Build_type(Complex) in Z.typ\n\n\n\nThis is yet another advantage one gets from having standardized interfaces.\n\nIf type indexed-values look similar to Haskell’s type-classes, it’s because they\nare. In my limited understanding of Haskell, the implementation is similar as\nwell, in that under the cover, Haskell passes around dictionaries of functions\nwhich play the same role that the Type.ts play here.\n\nThe number typeclass described above is just an example, and not something I’ve\nfelt the need for in practice. But here are some places where we’ve used\ntype-indexed values to good effect:\n\nSerialization\n\nThe latest (unreleased) version of the bin_prot macros that we use for binary\nserialization and deserialization now come with a type-indexed value that ties\ntogether all the little bits that you need to use the library. Before we did\nthat, one could only instantiate useful bin-prot functionality using the module\nlanguage. Now, we can do it using ordinary polymorphic functions.\n\nLittle languages\n\nSometimes we design domain-specific languages embedded in the type system. It is\noften useful to have values representing the different types that can be\ngenerated in the language. For example, we use this as part of a set of SQL\nbindings to represent types that we know how to convert to and from SQL.\n\nContainers\n\nWe’ve started experimenting with type-indexed values representing the\ncontainer-hood of a given object. This is a little trickier than the previous\nexamples, since the type-indexed value has two type parameters, one for the type\nof the container, and one for the type of the elements of the container. In the\nend, this let’s you write functions with signatures like\n\nmax: ('a,'b) Container.t -&gt; cmp:('a -&gt; 'a -&gt; int) -&gt; 'b -&gt; 'a\n\n\n\nand use it to find the maximum element of a list (where the type-indexed value\nhas type ('a, 'a list) Container.t) or an array (('a, 'a array) Container.t)\nor a string ((char,string) Container.t).\n\nType-indexed values obviously have their downsides: they can be somewhat\ninconvenient syntactically, since you need to explicitly pass them along; and\nthey sacrifice some performance because it leads you to call closures where you\ncould otherwise call static functions that could be inlined instead. But\noverall, they are a flexible and elegant way of writing generic code in ML.\n",
        "url"      : "https://blog.janestreet.com/a-working-programmers-guide-to-type-indexed-values/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Centralizing distributed version control",
        "date"     : "September 17, 2008",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 2,
        "content"  : "We switched over to using Mercurial\nabout a year and a half ago (from\ntla/baz–don’t\nask), and it’s worked out quite well for us. One of the key issues we ran up\nagainst is how to come up with a reasonable workflow, one that allowed people to\nwork independently when they needed, and also included an effective central\ncoordination point. We ended up coordinating our development through the use of\na compile daemon. The basic idea of the daemon is simple: for any daemonized\ntree, the compile daemon maintains two repositories, a\n\nstaging repo, and a main repo. The staging repo is where you push something\nthat you want to be considered by the compile daemon, and the main repo is where\nyou go to grab a fresh copy of the tree. The workflow is pretty simple. The\nstaging repo is multi-headed, which is to say that there is no unique revision\nthat is the most up-to-date. Instead, there are a bunch of competing development\nheads. The compile daemon’s job is to consider each of these heads in turn for\ninclusion into the main repo. The daemon tests a head by merging it with the\ncurrent head of the main repo, compiling the resulting tree, and running unit\ntests. If any of those steps fail, the compile daemon gives up on that head. If\nall the steps succeed, than the new revision becomes the new head of the main\nrepo.\n\nOur approach differs from the normal suggestion for creating a primary repo,\nwhich is to use a pull model: you have someone who is responsible for pulling\nand merging patches from other people’s trees, and thereby generating a central\ntree that others can pull from. But the compile-daemon workflow seems to work\nbetter for us. For one thing, there is no single sensible maintainer for some of\nour trees. Our primary tree contains about 800kloc, and dozens of different\nprojects. We like to throw them all together so that the compile daemon can\ncheck that the whole damn thing builds. This disincentizes developers from\nbreaking other people’s code, since in order to get their changes in, they need\nto fix every type error and every broken unit test their change would lead to.\nThe sheer size of the tree and the high rate of change makes it hard for any one\nperson to serve as a gateway for changes to get through.\n\nAnother thing about this model is that you need an effective mix of\ntype-checking and unit-testing to make sure that the tests run by the compile\ndaemon are effective in weeding out broken code. It also greatly increases the\nincentive for programmers to write unit tests, since unit tests become a way for\nprogrammers to defend their code against breakage by others.\n",
        "url"      : "https://blog.janestreet.com/centralizing-distributed-version-control/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Building a better compare",
        "date"     : "September 3, 2008",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 3,
        "content"  : "In a recent post, I described some of the\nproblems associated with OCaml’s built in polymorphic comparison functions. So,\nif you want to avoid OCaml’s polymorphic compare, what are your options? One\napproach is to simply write one’s own comparison functions explicitly.\nUnfortunately, it’s hard to do that cleanly. Consider the case of writing a\ncomparison function for a simple variant type:\n\ntype t =\n| Foo of Float.t\n| Bar of Int.t\n| Snoo of String.t\n\n\n\nYour first approach to writing a comparison function might be something like\nthis:\n\nlet tag_to_int = function\n| Foo _ -&gt; 0\n| Bar _ -&gt; 1\n| Snoo _ -&gt; 2\n\n\nlet compare x y =\n  let x = Int.compare (tag_to_int x) (tag_to_int y) in\n  if x &lt;&gt; 0 then x\n  else match x, y with\n  | Foo x, Foo y -&gt; Float.compare x y\n  | Bar x, Bar y -&gt; Int.compare x y\n  | Snoo x, Snoo y -&gt; String.compare x y\n  | _, _ -&gt; assert false\n\n\n\nThis is decent, but it unfortunately uses a fragile match at the end, which\nmeans if you extend t by adding another variant, the compiler will not warn\nyou to extend the math in compare (although it will warn you about extending\nthe match in tag_to_int). We can make the compare function a little better by\ngetting rid of the fragile match:\n\nlet compare x y =\n  match x, y with\n  | Foo x, Foo y -&gt; Float.compare x y\n  | Bar x, Bar y -&gt; Int.compare x y\n  | Snoo x, Snoo y -&gt; String.compare x y\n  | (Foo _ | Bar _ | Snoo _), _ -&gt;\n    Int.compare (tag_to_int x) (tag_to_int y)\n\n\n\nThis produces better compiler errors, but it’s still painful and error-prone.\nThe situation is even worse with records, since you don’t get any kind of\nexhaustiveness check at all. You can get the right kind of checks by using the\npa_fields macro I mentioned earlier, but the\nresulting code is not nearly as efficient as a direct implementation.\n\nGiven the pain involved in writing manual comparison functions, I’ve come to\nthink that the right solution is to use macros for generating these functions,\nin the style of sexplib. As with sexplib, this makes it easy to override the\nstandard behavior for a particular type in a lightweight way, by simply writing\na custom function in the few places where that is necessary. The rest of the\ntime, the macros can do the work. I suspect that such a macro will make its way\ninto a future release of core.\n",
        "url"      : "https://blog.janestreet.com/building-a-better-compare/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Ask and ye shall receive",
        "date"     : "August 27, 2008",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 0,
        "content"  : "At least, if you ask with a nicely detailed bug report. Looks like\nthe missed optimization for equality on polymorphic variants I\nmentioned in a previous post has been fixed. It will\nbe interesting to see what effect this has on our codebase…\n\n",
        "url"      : "https://blog.janestreet.com/ask-and-ye-shall-receive/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "The perils of polymorphic compare",
        "date"     : "August 18, 2008",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 3,
        "content"  : "I have a love-hate relationship with OCaml’s polymorphic comparison functions,\nwhich I think I share with a lot of people who use the language. For those who\ndon’t know what polymorphic compare, a quick explanation. OCaml has a collection\nof functions for comparing values which, magically enough, can be used on\nvirtually any data type. For example, there is a comparison function in\nPervasives with the following signature:\n\nval (&gt;=) : 'a -&gt; 'a -&gt; bool\n\n\n\nIf you stop to think about it, this is a pretty surprising little function. How\ndoes one write a single function to compare ints or floats or lists or any\nrandom type cobbled together out of records, tuples, and variants? \nIt turns out that you can’t write such a function on your own, at least not in\nOCaml. Special compiler magic is required. There is a function included with the\nruntime called\n\ncompare_val which compares values by descending their structure recursively\nfollowing fairly simple rules. For example, records are compared field by field,\nmoving on to the next field only to break a tie with the previous field.\nVariants are compared first by their tags, and then, if the tags are equal,\ndescending recursively to the content. compare_val operates directly on the\nlow-level representation of an OCaml value, completely ignoring the type system.\n\ncompare_val is undeniably convenient. One lovely thing this allows is the\ncreation of polymorphic set and map types (a trick which is strangely not used\nin the standard library). And full comparisons aside, a simple built-in\nstructural equality test is useful in a wide variety of contexts.\n\nBut compare_val has its complications as well. The fact that it ignores the\ntype system means it crosses all abstraction boundaries. A classic example is\nthat of a set type. The physical structure of the binary tree used to represent\na set is generally not uniquely determined by the contents of the set in\nquestion. That means that structural equality on sets is (surprisingly) not\nequivalent to ordinary set equality. Thus, you get the following surprising\nsituation.\n\nopen Core.Std\n\nlet s1 = Set.of_list [1;2;3]\nlet s2 = Set.of_list [2;1;3]\nlet () =\n  (* both assertions pass *)\n  assert (Set.equal s1 s2);\n  assert (s1 &lt;&gt; s2)\n\n\n\nSadly, there’s no way of convincing compare_val to use Set.equal to compare\nthe two sets, rather than descending into the structure of the values.\n\nAnother problem with compare_val is that it sometimes throws exceptions. If,\nin its traversal of a data structure, it encounters a function or a wrapped-up C\nobject that doesn’t support comparison, or an object, then it will just throw an\nexception. This leads to a source of runtime errors that can be hard to stamp\nout.\n\ncompare_val is also fairly slow. The OCaml compiler will optimize away calls\nto compare_val where it can, but it can’t do it in all cases. It’s a big\nenough issue that when I think of things to do to optimize an OCaml program, one\nof the first things on the list is to stamp out unnecessary invocations of\ncompare_val.\n\nGiven the tradeoffs involved with polymorphic compare, what should one do about\nit? More on that, in a later post.\n",
        "url"      : "https://blog.janestreet.com/the-perils-of-polymorphic-compare/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Better float unboxing",
        "date"     : "August 5, 2008",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 0,
        "content"  : "A couple of months ago, Pascal noticed some missed optimizations in OCaml’s\nfloat unboxing optimizations. In some cases, code that looked like it should be\ncompiled down to a sequence of allocation-free floating point operations turned\nout to involve quite a lot of allocation, with floats getting boxed and then\nimmediately unboxed for no purpose. The fact that the compiler missed this\nparticular optimization forced us in a few spots to do some ugly manual\ninlining, and generally made us sad.\n\nBut we are sad no more! We filed a bug report, and it just got fixed in OCaml’s\nCVS. You can see the details\nhere. Now all we’re waiting for\nis a fix to the missed optimization for equality on polymorphic\nvariants.\n",
        "url"      : "https://blog.janestreet.com/better-float-unboxing/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Folding over fields",
        "date"     : "June 21, 2008",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 3,
        "content"  : "One of the best features of ML is pattern matching. Pattern matching is\nessentially a way of writing a case analysis driven by the structure of the\ndata. The thing that makes pattern matching such a phenomenal tool is the\ntype-checking discipline that is associated with it. In particular, the compiler\nchecks that pattern matches are exhaustive and non-redundant. This is helpful\nwhen writing a case analysis for the first time, but the value of the technique\nreally shows itself as the code evolves. In the absence of such checks, it’s\neasy for a case analysis to silently become incomplete as the underlying data\nstructures change, thus letting bugs creep in.\n\nPattern matching makes it easy to make sure that a case analysis is exhaustive,\nbut there are other kinds of exhaustiveness that we might want static checks for\nwhere the compiler provides no help. One example we see a lot comes up in the\ncontext of validation. In a lot of the code we write, it is necessary to\nvalidate a configuration value represented as a record, and we want to make sure\nthat every record field is explicitly checked.\n\nAfter hearing me complain about how little help the language provided in this\ncase, Steven proposed an elegant solution: use a macro to generate for any\nrecord definition a function that folds over the fields in that record. Thus, if\nyou declare a record as follows:\n\nmodule M = struct\n  type t = { foo: int;\n        bar: float;\n        snoo: (int * string) list; }\n  with field_fold\nend\n\n\n\nyou would end up with the following signature\n\nmodule M : sig\ntype t = {\n      foo: int;\n      bar: float;\n      snoo: (int * string) list; }\n\n  val field_fold :\n    t -&gt; 'a\n    -&gt; foo : (int -&gt; 'a -&gt; 'a)\n    -&gt; bar : (float -&gt; 'a -&gt; 'a)\n    -&gt; snoo : ((int * string) list -&gt; 'a -&gt; 'a)\n    -&gt; 'a\nend\n\n\n\nBy writing your validation function around field_fold, you can make sure that\nthe validation catches every field in the record, even as the definition of the\nrecord changes over time. But this idiom is not really specific to the\nvalidation case. You can use it just as well in other situations where\nexhaustiveness is important, such as writing an equality function or writing\ncode to serialize a record.\n\nWe haven’t implemented this yet, but it’s coming. The version we’re going to do\nis a little bit more sophisticated that what I’ve described here. We already\nhave a set of macros for generating values representing each field in a record\nthat can be used for getting and setting values and which contain a string\nrepresentation of the name of the field. I expect our field_fold will be based\non these field representatives rather just on the raw values. I expect this all\nwill eventually find its way into a public release of core.\n",
        "url"      : "https://blog.janestreet.com/folding-over-fields/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "The dangers of being too partial",
        "date"     : "May 29, 2008",
        "authorId" : "pzimmer",
        "author"   : "Pascal Zimmer",
        "tags"     : [],
        "minsToRead" : 13,
        "content"  : "This article deals with some not well-known dark corners of the OCaml compiler\nand how to get around them to produce more efficient code. The bottom line is\nthat you should avoid using partial applications and instead prefer\neta-expanding your functions to the maximum. To understand why, let’s compare\nthe performance of the following definitions:\n\nlet f a b c = a + b + c\nlet g1 a l = List.fold_left ~f:(f a) ~init:0 l\nlet g2 a l = List.fold_left ~f:(fun b -&gt; f a b) ~init:0 l\nlet g3 a l = List.fold_left ~f:(fun b c -&gt; f a b c) ~init:0 l\n\n\n\nThose three versions are all semantically equivalent (as long as f does not\nperform mutations between arguments): they all return the sum of the elements in\nl plus a for each element. Yet, they have quite different execution times (n\nis the length of the list l, please ignore the last 3 columns for the moment):\n\n\n  \n    \n      n\n      g1\n      g2\n      g3\n      g4\n      g5\n      g6\n    \n  \n  \n    \n      0\n      0.18\n      0.11\n      0.12\n      0.14\n      0.12\n      0.10\n    \n    \n      1\n      0.39\n      0.43\n      0.22\n      0.25\n      0.35\n      0.24\n    \n    \n      5\n      1.16\n      1.78\n      0.64\n      0.62\n      1.54\n      0.23\n    \n    \n      10\n      2.16\n      3.49\n      1.11\n      1.14\n      2.73\n      0.36\n    \n    \n      100\n      20.32\n      33.69\n      10.19\n      10.25\n      24.45\n      2.85\n    \n  \n\n\nSo, what is going on here? The answer is in the way the OCaml compiler optimizes\ncalls to functions with multiple arguments. The lambda-calculus interpretation\nof f is of a function that takes one argument a, returns a closure (i.e. a\nfunction plus an environment containing the value of a), which in turn takes\none argument b, etc. Applying f to three arguments would then amount to\napplying them one by one, creating intermediate closures in the process.\n\nFor efficiency reasons, most functional languages compile f as a function of\narity 3, i.e. as a function optimized to take 3 arguments all at once. Most of\nthe time, functions are called with all their arguments anyway, so this\noptimization is a good thing. In essence, this is similar (but not quite\nequivalent) to the following non-currified version of f:\n\nlet f_non_currified (a,b,c) = a + b + c\n\n\n\nObviously, the currified version of f is strictly more expressive since it\nallows partial applications and it avoids dealing with the allocation of the\ntuples (a,b,c), using three CPU registers or the stack instead. Still, there\nneeds to be a mechanism to deal with applications where the numbers of arguments\ndo not match: inside List.fold_left, there is a call to the argument ~f with\ntwo arguments (the list element and the accumulated result). If the arity of\n~f is not 2, some generic functions caml_applyN and caml_curryN come into\nplay to decompose the applications into simpler steps, potentially creating the\nintermediate closures we talked about. You might have seen those functions show\nup in the output of gprof sometimes, but not many people are actually fully\naware of this internal mechanism (if someone knows of a good reference on this\nsubject, please report it as I could not find any). If you are curious about\ntheir definitions, you can look at them in the (undocumented) output of\nocamlopt -dcmm.\n\nLet’s go back to our example and consider g3 first. One closure (the argument\nto List.fold_left) is initially allocated, but since this function is of arity\n2 (b and c), all the calls from List.fold_left are direct and require no\nallocation.\n\nOne the other hand, g1 also allocates a closure initially, but it is of a\ndifferent nature: it is a call to caml_curry3 with f and a as arguments.\nThis function creates a closure of arity 1, simply waiting for the next\nargument, so that when List.fold_left calls it, there is an arity mismatch. We\nthen end up passing the first argument b first, creating a new closure and\nfinally passing in the second argument c. In the end, we allocate one extra\nclosure per element in the list (and execute more instructions), which is why\ng1 is twice slower than g3.\n\ng2 is even worse: the initial allocation creates a closure of arity 1 (waiting\nfor the argument b). There is a first mismatch when List.fold_left calls it,\nas with g1, so we go into slow mode and pass in b only. But then there is a\nsecond mismatch inside the code of the local function when we call f (of arity\n3) with only the two arguments a and b. Arguments are passed in one-by-one\nand we create two intermediate closures in the process. Finally, c which was\nput aside from List.fold_left is applied in a final step. In the end, g2\nallocates two closures per element in the list and is three times slower than\ng3.\n\nThose interpretations can be checked by measuring the number of allocated words\nwith (Gc.stat ()).Gc.minor_words before and after a function call (I thank\nStephen Weeks for this idea). The following results confirm how much GC pressure\nis created by g1 and g2 compared to g3 (a closure typically takes 4 or 5\nwords in our example):\n\n\n  \n    \n      g1\n      g2\n      g3\n      g4\n      g5\n      g6\n    \n  \n  \n    \n      5 + 5n\n      4 + 10n\n      5\n      5\n      5 + 5n\n      5\n    \n  \n\n\nIt is worth noting that the actual cost does not come so much from the\nallocating part (it is just a few extra instructions after all), but from the\nGC: reclaiming the memory back is quite expensive and this cost is amortized\nover all allocations. Reducing the total amount of allocation will automatically\nreduce the amount of GC pressure. There is an old\npage\nwritten by Pierre Weiss that basically says that a block allocation is roughly\nequivalent to 100 additions on average (taking into account the cost of\nreclaiming the memory by the GC). Those results are outdated so they probably do\nnot apply anymore, but they give the right order of magnitude.\n\nA different style\n\nIf we really wanted to use a partial application as in g1, we would need to\ntell the compiler that f is a function that takes one argument a and return\na function taking two arguments. Ideally, we would like to write one of the\nfollowing:\n\nlet f2 a = fun b c -&gt; a + b + c\nlet f2 = fun a -&gt; fun b c -&gt; a + b + c\n\n\n\nSadly, the OCaml parser still interprets those definitions as a function of\narity 3. We have to write the following cumbersome version to make it work:\n\nlet f2 a = (); fun b c -&gt; a + b + c\nlet g4 a l = List.fold_left ~f:(f2 a) ~init:0 l\n\n\n\n(or any other no-op operation instead of () ). This does indeed solve the\nproblem: if you look at the timings above, you will see that g4 is indeed\nabout as fast as g3, and for good reason since they now do essentially the\nsame operations and allocations.\n\nBut what if we accidentally use f2 combined with the eta-expanded style of\ng3 ?\n\nlet g5 a l = List.fold_left ~f:(fun b c -&gt; f2 a b c) ~init:0 l\n\n\n\nAs you can see in the timings, things get much worse again, since for every list\nelement, a closure will be created from the application of f2 to a.\n\nWhether to go with g3 or g4 is a matter of style: if you know the arguments\nare always going to be decomposed in the same way, you can use g4. But if you\nuse your function sometimes partially, sometimes with all its arguments, or with\ndifferent levels of partial application, you have to resort to g3. Since this\nstyle is also optimal in any scenario, it is probably the right choice in most\nsituations.\n\nAs an aside note, there is another major optimization down the road. Instead of\ncalling the generic List.fold_left (or any other such function), we could\nredefine a local recursive function that calls f directly:\n\nlet g6 a l =\n  let rec fold_left accu = function\n  | [] -&gt; accu\n  | (x::r) -&gt; fold_left (f a accu x) r\n  in\n  fold_left 0 l\n\n\n\nAs you can see from the timings, this would give you another factor of 3 in this\nparticular case (mostly because of inlining, more efficient loops and direct\nbranching instead of functions calls). It might be possible to perform this\nsyntactic transformation automatically through a Camlp4 extension, however we\nhave not tried it yet (and one should consider the impact on code size and\ninstruction caching in the CPU).\n\nInlining\n\nAnother nasty effect of partial applications is that they prevent inlining. It\nis a common style to define small generic functions and make specialized\nversions through partial application. Here is a somewhat extreme example:\n\nlet mult_and_add m c x = m * x + c\nlet double_and_add_v1 = mult_and_add 2\nlet double_and_add_v2 c x = mult_and_add 2 c x\n\n\n\nIf you look at the assembly code generated for those functions (with\nocamlopt -S), you will see that the second version gets compiled as only two\ninstructions, does not require any allocation and is likely to be inlined\nfurther at calling sites. The first version on the other hand creates one\nclosure from the partial application at startup, another one on every call (!)\nand is composed of many more instructions. I ran some benchmarks and found a\nspeedup factor of about 15 between the two versions.\n\nYet another trick\n\nThis is a somewhat related trick that applies whenever you have the following\n(quite common) scheme:\n\nList.iter ~f:(fun x -&gt; do_something ... x ...) l\n\n\n\n(in most cases with List.iter, List.map, List.fold_left, etc). Every time\nthis code gets executed, a new closure gets allocated for the local function\nbefore the call to List.iter. If the argument list l is going to be empty a\nlarge fraction of the time, the closure will be useless most of the time. This\ncan get quite expensive in the long run and can be avoided with the following\ntrick:\n\nif l &lt;&gt; [] then List.iter ~f:(fun x -&gt; do_something ... x ...) l\n\n\n\nSimilarly, if you know that l is likely to contain only 0 or 1 element, you\ncan specialize even further:\n\nmatch l with\n| [] -&gt; ()\n| [x] -&gt; do_something ... x ...\n| _ -&gt; List.iter ~f:(fun x -&gt; do_something ... x ...) l\n\n\n\nYou will save the closure allocation in most cases and the function call in the\n1-element case is a good candidate for inlining.\n\nNote that this trick does not apply when the local function does not have any\nfree variable (i.e. when it does not reference any value outside the scope of\nthe local function). In this case, no closure is necessary and the function is\nfully allocated at compile time.\n\nDoes it really matter ?\n\nIt really depends what kind of code you are writing. If your application is very\nsmall, runs in a short period of time or does not do enough allocation to ever\ntrigger the GC, then you probably don’t care. But for applications with\nhigh-performance requirements or for central libraries that are used everywhere,\nit can make a big difference. Here at Jane Street, we achieved significant\nspeedups with those simple transformations on one of our biggest and most speed\nsensitive application.\n\nI have to agree that eta-expanding all the functions in my code is\nintellectually less satisfying than using higher-order code and partial\napplications. Readability can also get impacted, although there are situations\nwhere it benefits from the change, making it more explicit what the different\narguments are.\n",
        "url"      : "https://blog.janestreet.com/the-dangers-of-being-too-partial/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Ensuring that a function is polymorphic",
        "date"     : "May 14, 2008",
        "authorId" : "sweeks",
        "author"   : "Stephen Weeks",
        "tags"     : [],
        "minsToRead" : 3,
        "content"  : "Here’s a little trick that I find useful when I get a type error due to a\nfunction that I believe is polymorphic, but isn’t due to some bug. For example,\nsuppose I had a function\n\nf that I believed was of type 'a list -&gt; 'a list, but really wasn’t.\n\nlet f x = 13 :: x (* suspend disbelief -- pretend f is large and complex *)\nlet l = f [\"foo\"]\n\n\n\nIf I feed this to OCaml, I get an error message at the point f is applied\nsaying:\n\nThis expression has type string but is here used with type int\n\n\n\nI would like to find out what the error in f is that makes it not polymorphic.\nWhen I first came to OCaml from SML, I was surprised to find the following did\nnot work.\n\nlet f (x : 'a list) = 13 :: x\nlet l = f [\"foo\"]\n\n\n\nIn SML, type variables in expressions are universally quantified (at a point\ndetermined by some complex rules), while in OCaml they are not. So, while SML\nwould reject the definition of f, OCaml happily unifies a with int and\ncontinues.\n\nIn OCaml, one can get universally quantified type variables by using the\nsignature language.\n\ninclude (struct\n  let f x = 13 :: x\nend : sig\n  val f : 'a list -&gt; 'a list\nend)\nlet l = f [\"foo\"]\n\n\n\nThis fails with a more helpful error message.\n\nSignature mismatch:\nModules do not match:\n  sig val f : int list -&gt; int list end\nis not included in\n  sig val f : 'a list -&gt; 'a list end\nValues do not match:\n  val f : int list -&gt; int list\nis not included in\n  val f : 'a list -&gt; 'a list\n\n\n\nHowever, it’s a lot of syntax to use the signature language, and can be\ndifficult if the function you want is not at the top level. Furthermore, you may\nnot want to write out the full type – perhaps you only want to add enough of a\nconstraint to catch the error. In SML, I just had to write the constraint on x\nand I was done. Fortunately, one can approximate the SML solution in OCaml by\nusing a new type that has no values.\n\ntype z\nlet f (x : z list) = 13 :: x\n\n\n\nThis fails with an error message at the use of x that is quite helpful.\n\nThis expression has type z list but is here used with type int list\n\n\n\nIf f actually were polymorphic, then instantiating the polymorphism with a new\ntype should succeed, and I would get an error later in the code at a use of f.\nSo, using this trick I can now focus on f until I fix all its type errors, at\nwhich point I can remove the constraint and type z.\n",
        "url"      : "https://blog.janestreet.com/ensuring-that-a-function-is-polymorphic/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Core Principles: uniformity of interface",
        "date"     : "May 11, 2008",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 4,
        "content"  : "This is intended to be the first in a series of posts talking about the design\nprinciples behind core, Jane Street’s alternative to OCaml’s standard library.\n\nIt’s worth noting that we haven’t quite fully achieved any of our design goals.\nCore is at the center of a complicated and evolving software infrastructure,\nand it takes longer to force changes through that infrastructure that it does to\nfigure out what changes should be made. So these principles serve as both a\nguide to how the library is currently laid out as well as an indication of what\nkinds of changes are likely to come over the next year or so. The principle I’m\ngoing to talk about in this post is the idea of uniformity of interface. There\nare a few basic reasons for keeping interfaces uniform: first, to make it easier\nfor people to learn and remember a module’s interface; second, to make it easier\nto use functors to extend a module’s functionality; and third, to avoid wasting\ntime on making essentially trivial design decisions over and over. The last one\nis a bit surprising but is nonetheless real. When you have a significant number\nof people collaborating on a code base, having standards for how that code is to\nbe written eliminates a lot of pointless decision-making about how things should\nbe done.\n\nHere are a few of the design ideas we’ve had that we try to apply uniformly:\n\nTypes and modules\n\nIn core, almost all types have dedicated modules, with the type associated\nwith a module called t. This is not an uncommon pattern in OCaml code in\ngeneral and in the standard library in particular, but in core, the approach is\ntaken more consistently. Thus, core has modules for float, int, option and\nbool. This is convenient both because it provides natural place to put\nfunctions and values that otherwise just swim around in Pervasives, and\nbecause it makes the naming easier to remember. For instance, the modules\nBool, Float and Int all have to_string and of_string functions.\nSimilarly, the Int module has the same basic interface as the Int64, Int32\nand Nativeint modules.\n\nt comes first\n\nOne choice that you have to make over and over again in any library is the order\nin which arguments are listed. One thing you could optimize for when making this\ndecision is the ease of use for partial application. This is not a crazy\napproach, but it’s often hard to guess in advance which order will be most\nuseful. There are other things to consider as well: putting a function argument\n(e.g., the function that you pass to List.map) at the end often increases\nreadability, since the function argument can be quite large and is often awkward\nsitting in the middle of the argument list. Sadly, this often conflicts with the\nmost useful order for partial application.\n\nRather than make idiosyncratic choices on a function-by-function basis, we\nprefer to have clear and unambiguous rules where possible. Once such rule we’ve\n(mostly) adopted is, within a module whose primary type is t, to put the\nargument of type t first. Thus, Map.find, Hashtbl.find and Queue.enqueue\nall take the container type first. This rule doesn’t lead to an optimal choice\nfor every function, but it is very convenient, and is simple and easy to apply\nconsistently.\n\nExceptions, options and function names\n\nIn core, the default functions only throw exceptions in truly exceptional\ncircumstances. Thus, List.find returns an option rather than throwing\nNot_found. That said, there are cases where the exception-throwing version of\nthe function is useful as well. The convention we now use is to mark the\nexception-throwing version of a function with _exn. So, we have Map.find and\nMap.find_exn, Queue.peek and Queue.peek_exn, and List.nth and\nList.nth_exn.\n\nStandardized interface includes\n\nThere are a number of standardized interfaces that we use as components of lots\nof different signatures. Thus, if you had a module representing a type that\ncould be converted back and forth to floats, supported comparison and has its\nown hash function, you could write the interface as follows:\n\nmodule M : sig\n  type t\n  include Floatable with type floatable = t\n  include Comparable with type comparable = t\n  include Hashable with type hashable = t\nend\n\n\n\nBy making some of core’s conventions explicit, it makes it easier to enforce\nthese conventions, so that parallel functions are forced to have the same name\nand type signatures across many different modules. It also makes it easier to\ndesign functors on top of these modules. So, for example, the Piecewise_linear\nmodule contains a functor that takes as its input any module which is both\nfloatable and sexpable.\n",
        "url"      : "https://blog.janestreet.com/core-principles-uniformity-of-interface/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Core has landed",
        "date"     : "May 2, 2008",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 1,
        "content"  : "We are proud to announce the first public release of core, Jane Street’s own\nalternative to OCaml’s standard library. We use this library as the base for our\nown development, and we hope people on the outside will find some use for it as\nwell. People should be warned that\n\ncore is still in flux: there are interfaces that we have plans to change, so\nif you’re not willing to come along for the ride, you shouldn’t use it. Also, be\nwarned that conformance with the OCaml standard library is not a goal, and we\nhave already deviated from it in a number of ways. It’s also worth noting that\nwe have only used and tested this library on x86 and x86-64 on Linux, and we\nmake no claims about other platforms.\n\nYou can find the\nlibrary here, along with\nthree other libraries that you will need to use along with it: type-conv,\nsexplib and bin-prot. These three libraries provide macros for generating\nfunctions for serializing and deserializing types. sexplib uses a\nhuman-readable s-expression format, and bin-prot uses a high-performance\nbinary protocol, and type-conv is the common base of the other two libraries.\nThis is also the first public release of bin-prot, and like sexplib, that\nlibrary can be used independently of core.\n\n(Update: as was noted by some commentors, the packages\nounit and\nres are also required to build\ncore.)\n\nIf you have any comments or patches, we’d love to hear about it. The blog is a\ngreat place for comments, and patches should be sent to\nopensource@janestcapital.com.\n\nAll of the released libraries are licensed under the\nLGPL-plus-linking-exception that is used by the OCaml standard library.\n\nADDENDUM: Core and the related libraries have been relicensed under the\nApache2 license.\n",
        "url"      : "https://blog.janestreet.com/core-has-landed/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "using with type on a variant type",
        "date"     : "April 25, 2008",
        "authorId" : "sweeks",
        "author"   : "Stephen Weeks",
        "tags"     : [],
        "minsToRead" : 3,
        "content"  : "Here’s a type-checking problem I ran into today. I had a module with a variant\ntype matching a signature that exposed the variant type.\n\nmodule type S = sig\n  type t = A | B\nend\n\nmodule M : S = struct\n  type t = A | B\nend\n\n\n\nI wanted to extend the module with some new functions, and match a new signature\nthat extended the original signature. Easy, right?\n\nmodule type S' = sig\n  include S\n  val f : t -&gt; t\nend\n\nmodule M' : S' = struct\n  include M\n  let f = function A -&gt; B | B -&gt; A\nend\n\n\n\nIt was important to me to be able to include S in the new signature and\ninclude M in the new module to avoid duplicating code.\n\nThen I hit a snag. As the code above stands, the two types, M.t and M'.t are\ndifferent. We have a large codebase here at Jane Street, and there was some\nexisting code that used the old module, M, and some other code that would use\nthe new module M. I don’t want to change all of our code to use the new\nmodule, and I want our code to be able to interoperate – I don’t want two\ndifferent types floating around.\n\nSimple, right? Just use with type. That is, define S' as follows.\n\nmodule type S' = sig\n  include S with type t = M.t\n  val f : t -&gt; t\nend\n\n\n\nUnfortunately, that gives the following error.\n\nIn this `with' constraint, the new definition of t does not match its original definition in the constrained signature:\nType declarations do not match:\n  type t = M.t\nis not included in\n  type t = A | B\n\n\n\nThe with type would all work if we hadn’t exposed the variant in the original\nsignature (check for yourself and see). But that’s not viable – I wanted to\nexpose the variant.\n\nI talked with some people here and we came up with a workaround, but I’d like to\nknow if someone has a better one. Here’s our workaround.\n\nmodule M = struct\n  type t = A | B\nend\n\nmodule type S = sig\n  type t = M.t = A | B\nend\n\nmodule type S' = sig\n  include S val f : t -&gt; t\nend\n\nmodule M' : S' = struct\n  include M let f = function A -&gt; B | B -&gt; A\nend\n\n\n",
        "url"      : "https://blog.janestreet.com/using-with-type-on-a-variant-type/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "OCaml Annoyance #23: type declarations are implicitly recursive",
        "date"     : "April 16, 2008",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 1,
        "content"  : "Unlike let declarations, type declarations in OCaml are automatically\nrecursive. This seems harmless at first, but it actually causes more trouble\nthan it’s worth. To see why, let’s look at an example. Here’s a simple signature\nthat uses nested modules and that adopts the reasonable convention of using t\nfor the primary type of a module.\n\nmodule Thing : sig\n  type t\n\n  module Collection : sig\n    type t\n  end\n\n  val create : string -&gt; t\n  val collect : t list -&gt; Collection.t\nend\n\n\n\nUnfortunately, implementing this is made more painful by the fact that type\ndeclarations are recursive by default. If you do the obvious thing:\n\nmodule Thing = struct\n  type t = string\n\n  module Collection = struct\n    type t = t list\n  end\n\n  let create x = x\n  let collect xs = xs\nend\n\n\n\nYou get the following error:\n\nFile \"type_is_implicitly_rec.ml\", line 5, characters 9-20:\nThe type abbreviation t is cyclic\n\n\n\nYou can fix this by introducing an extra dummy type definition in between to\nbreak the recursion:\n\nmodule Thing = struct\n  type t = string\n\n  module Collection = struct\n    type z = t\n    type t = z list\n  end\n\n  let create x = x\n  let collect xs = xs\nend\n\n\n\nThis works, but it’s ugly and kind of confusing.\n",
        "url"      : "https://blog.janestreet.com/ocaml-annoyance-23-type-declarations-are-implicitly-recursive/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "The ML sweet spot",
        "date"     : "April 2, 2008",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 2,
        "content"  : "I just got back from visiting Northeastern and Harvard where I yet again flogged\na version of my POPL talk.\nOlin Shivers was my host at Northeastern\nand Greg Morrisett at Harvard. It was a\nbit of a rushed visit, but a lot of fun nonetheless.\n\nBoth Greg and Olin are very interested in making the next big jump in\nprogramming languages, and they both think that that next step will require\nbetter ways of reasoning statically about programs. I think they’re dead-on in\nterms of what the right direction to go is, but I think they’ve got their work\ncut out for them. It will be hard to beat ML because ML sits in a kind of sweet\nspot; make it a little bit better in one aspect, and you give something up in\nanother. I can think of two ways in which this is true. The first has to do with\nthe expressiveness of the type system. If you make the type system much more\nexpressive, you generally need to give up a lot on the type-inference side and\nin the quality and comprehensibility of error messages. You can actually see\nthis at work in OCaml. Some of the more advanced features, like polymorphic\nvariants and objects, do indeed suffer from more obscure error messages and\nworse type inference than the rest of the language. I think polymorphic variants\nin particular are worth the trouble, but it just underlines the fact that adding\nto the Hindley-Milner type system is tricky. And compared to some of the things\nOlin and Greg and others in the PL community are thinking about, OCaml’s\nextensions to HM are pretty tame.\n\nThe second way in which ML is in a sweet spot has to do with the execution\nstrategy. In ML you can write code that is reasonably “declarative” (which I\nthink mostly means that you can express your logic cleanly and concisely), and\nat the same time there is a straight-ahead execution strategy that allows you to\nrun the code reasonably efficiently. You can make things a little more\ndeclarative, but you often give up quite a bit on the predictability of the\nperformance you get out of your code. There are plenty of other languages where\nyou can see this tradeoff play out. Haskell’s laziness allows for a more\ndeclarative style at the expense of making it harder to think about space\ncomplexity (and indeed, Haskell’s designers are aware of this. Simon Peyton\nJones like to say that he thinks the\nnext ML should be pure and the next Haskell should be\nstrict). SQL and\nProlog make it possible to specify queries without worrying too much about how\nthe search for an answer is to be performed, at the expense that both time and\nspace complexity become hard to reason about.\n\nML is hardly perfect. But it’s such a nice compromise that it’s going to take a\nreal leap forward on the research side to definitively surpass it.\n",
        "url"      : "https://blog.janestreet.com/the-ml-sweet-spot/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Bind without tears",
        "date"     : "March 26, 2008",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 2,
        "content"  : "One of the annoyances of using monads in OCaml is that the syntax is awkward.\nYou can see why if you look at a simple example. Assume that you’re using the\nusual option monad. If we define &gt;&gt;= to be the bind operator, you might end\nup with a piece of code that looks like this:\n\nlet f x_opt y_opt z_opt =\n  x_opt &gt;&gt;= (fun x -&gt;\n        y_opt &gt;&gt;= (fun y -&gt;\n              z_opt &gt;&gt;= (fun z -&gt;\n                    return (x + y + z))))\n\n\n\nThis is awful for two reasons: the indentation is absurdly deep, and secondly,\nand there are too many closing parens at the end.  One solution to\nthis is\n\npa_monad, a camlp4 syntax\nextension that lets you write the same piece of code like this:\n\nlet f x_opt y_opt z_opt =\n  perform\n  x &lt;-- x_opt;\n  y &lt;-- y_opt;\n  z &lt;-- z_opt;\n  return (x + y + z)\n\n\n\nThis is much less painful to look at, but introducing new syntax has a cost, and\nit’s worth seeing if we can make things more livable without resorting to a\nsyntax extension. We can make our original code less eye-gougingly awful by just\nchanging the indentation:\n\nlet f x_opt y_opt z_opt =\n  x_opt &gt;&gt;= (fun x -&gt;\n  y_opt &gt;&gt;= (fun y -&gt;\n  z_opt &gt;&gt;= (fun z -&gt;\n  return (x + y + z))))\n\n\n\nThat still leaves the parens. But as it happens, the parens can simply be\ndropped! This was passed on to me by Adam Chlipala, who got it in turn from\nXavier Leroy at POPL this year. Our code can thus be written as follows:\n\nlet f x_opt y_opt z_opt =\n  x_opt &gt;&gt;= fun x -&gt;\n  y_opt &gt;&gt;= fun y -&gt;\n  z_opt &gt;&gt;= fun z -&gt;\n  return (x + y + z)\n\n\n\nThis seems clean enough to actually use in practice. Now all I need is to coerce\ntuareg into indenting my code\nthis way, and I’ll be happy…\n",
        "url"      : "https://blog.janestreet.com/bind-without-tears/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Variable-argument functions",
        "date"     : "March 20, 2008",
        "authorId" : "sweeks",
        "author"   : "Stephen Weeks",
        "tags"     : [],
        "minsToRead" : 1,
        "content"  : "Here’s another puzzle:\n\nIs it possible in OCaml to define a variable-argument function? For example, can\none define a function f and values a and z such that the following\nassertions hold:\n\nassert (f z = 0);\nassert (f a z = 1);\nassert (f a a z = 2);\nassert (f a a a z = 3);\n...\n\n\n\nOnce you’ve got that, how about generalizing it to a variable-argument sum, i.e.\ndefine\n\nf, a, and z such that:\n\nassert (f (a i1) (a i2) ... (a ik) z = i1 + i2 + ... + ik);\n\n\n\nOr, if you want to eliminate the parens, define an f, a, and z such that:\n\nassert (f a i1 a i2 ... a ik z = i1 + i2 + ... + ik);\n\n\n",
        "url"      : "https://blog.janestreet.com/variable-argument-functions/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Typing RPCs",
        "date"     : "March 15, 2008",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 13,
        "content"  : "At Jane Street, we end up writing lots of messaging protocols, and many of these\nprotocols end up being simple RPC-style protocols, i.e., protocols with a\nclient and a server, where communication is done in a simple query/response\nstyle.\n\nI’ve always found the writing of these protocols rather unsatisfying, because I\ncould never find a clean way of writing down the types. In the following, I’d\nlike to describe some nice tricks I’ve learned recently for specifying these\nprotocols more cleanly.\n\nA Simple Example\n\nI’ll start with a concrete example: a set of RPCs for accessing a remote\nfilesystem. Here are the signatures for a set of functions that we want to make\navailable via RPC.\n\ntype path = Path of string list with sexp \ntype 'a result = Ok of 'a | Error of string with sexp\n\nval listdir : path -&gt; string list result\nval read_file : path -&gt; string result\nval move : path * path -&gt; unit result\nval put_file : path * string -&gt; unit result\nval file_size : path -&gt; int result\nval file_exists : path -&gt; bool \n\n\n\nThe with sexp appended to the end of the type definitions comes from the Jane\nStreet’s publicly available sexplib macros. These macros generate functions\nfor converting values to and from s-expressions. This is fantastically helpful\nfor writing messaging protocols, since it gives you a simple hassle-free\nmechanism for serializing values over the wire. (Unfortunately, s-expression\ngeneration is not the fastest thing in the world, which is why we’ve written a\nset of binary serialization macros for high-performance messaging applications,\nwhich we intend to release.)\n\nThe usual next step would be to build up two types, one for the queries and one\nfor the responses. Here’s what you might write down to support the functions\nshown above.\n\nmodule Request = struct \n  type t = | Listdir of path \n      | Read_file of path \n      | Move of path * path \n      | Put_file of path * string \n      | File_size of path \n      | File_exists of path \n  with sexp \nend\n\nmodule Response = struct \n  type t = | Ok \n  | Error of string \n  | File_size of int \n  | Contents of string list \n  | File_exists of bool \n  with sexp  \nend \n\n\n\nIn some ways, this is great. The types are simple to write down and understand,\nand you get the wire protocol virtually for free from the s-expression\nconverters. And both the server and the client code are pretty easy to write.\nLet’s look at how that code might look.\n\nFirst, let’s assume we have functions for sending and receiving s-expressions\nover some connection object, with the following signature:\n\nval send : conn -&gt; Sexp.t -&gt; unit \nval recv : conn -&gt; Sexp.t \n\n\n\nThen the server code should look something like this:\n\nlet handle_query conn = \n  let module Q = Query in \n  let module R = Response in \n  let msg = Q.t_of_sexp (recv conn) in \n  let resp = \n    match query with \n    | Q.Listdir path -&gt; \n      begin match listdir path with \n      | Ok x -&gt; R.Contents x \n      | Error s -&gt; R.Error s \n      end \n    | Q.Read_file path -&gt; \n    . \n    . \n    . \n  in \n  send (R.sexp_of_t resp) \n\n\n\nAnd the client code could look like something this:\n\nlet rpc_listdir conn path = \n  let module Q = Query in \n  let module R = Response in \n  send conn (Q.sexp_of_t (Q.Listdir path)); \n  match R.t_of_sexp (recv conn) with \n  | R.Contents x -&gt; Ok x \n  | R.Error s -&gt; Error s \n  | _ -&gt; assert false \n\n\n\nUnfortunately, to make this all work, you’ve been forced to turn your type\ndefinitions sideways: rather than specifying for each RPC a pair of a request\ntype and a response type, as you do in the specification of ordinary function\ntype, you have to specify all the requests and all the responses at once. And\nthere’s nothing in the types tying the two sides together. This means that there\nis no consistency check between the server code and the client code. In\nparticular, the server code could receive a File_size query and return\nContents, or Ok, when really it should only be returning either a\nFile_size or Error, and you would only catch it at runtime.\n\nSpecifying RPCs with Embeddings\n\nBut all is not lost! With just a little bit of infrastructure, we can specify\nour protocol in a way that ties together the client and server pieces. The first\nthing we need is something that we’re going to call an embedding, but which\nyou might see referred to elsewhere as an embedding-projection pair. An\nembedding is basically a pair of functions, one for converting values of a given\ntype into some universal type, and the other for converting back from the\nuniversal type. (For another take on universal types, take a look at\nthis post from Steven). The universal type we’ll use is\nS-expressions:\n\ntype 'a embedding = { inj: 'a -&gt; Sexp.t; \nprj: Sexp.t -&gt; 'a; } \n\n\n\nIt’s worth noting that the projection function is always going to be partial,\nmeaning it will fail on some inputs. In this case, we’ll encode that partiality\nwith exceptions, since our s-expression macro library generates conversion\nfunctions that throw exceptions when a value doesn’t parse. But it’s often\nbetter to explicitly encode the partiality in the return type of the projection\nfunction.\n\nWe can now write up a type that specifies the type of the RPC from which we can\nderive both the client and server code.\n\nmodule RPC = struct \n  type ('a,'b) t = { \n    tag: string; \n    query: 'a embedding;\n    resp: 'b embedding;\n  } \nend \n\n\n\nHere’s a how you could write the RPC.t corresponding to the listdir\nfunction:\n\nmodule RPC_specs = struct \n  type listdir_resp = string list result with sexp \n  let listdir = { RPC. \n    tag = \"listdir\"; \n    query = { inj = sexp_of_path; \n          prj = path_of_sexp; }; \n    resp = { inj = sexp_of_listdir_resp; \n          prj = listdir_resp_of_sexp; }; \n  }\n\n  .\n  .\n  .\n\nend \n\n\n\nOne slightly annoying aspect of the above code is that we had to define the type\nlistdir_resp purely for the purpose of getting the corresponding s-expression\nconverters. At some point, we should do a post on type-indexed values to explain\nhow one could get around the need for such a declaration.\n\nNote that the above specifies the interface, but not actually the function used\nto implement the RPC on the server side. The embeddings basically specify the\ntypes of the requests and responses, and the tag is used to distinguish\ndifferent RPCs on the wire.\n\nAs you may have noticed, an ('a,'b) RPC.t corresponds to a function of type\n'a -&gt; 'b. We can put this correspondence to work by writing a function that\ntakes an ('a,'b) RPC.t and an ordinary function of type\n\n'a -&gt; 'b\n\n\n\nand produces an RPC handler. We’ll write down a simple implementation below.\n\ntype full_query = string * Sexp.t with sexp \n(* The first part is the tag, the second half is the s-expression for the arguments to the query. We only declare this type to get the s-expression converters *)\n\nmodule Handler : sig \n  type t \n  val implement : ('a,'b) RPC.t -&gt; ('a -&gt; 'b) -&gt; t \n  val handle : t list -&gt; Sexp.t -&gt; Sexp.t \nend\n = \n struct \n  type t = { tag: string; \n        handle: Sexp.t -&gt; Sexp.t; }\n\nlet implement rpc f = \n  { tag = rpc.RPC.tag; \n  handle = (fun sexp -&gt; \n    let query = rpc.RPC.query.prj sexp in \n    rpc.RPC.resp.inj (f query)); }\n\nlet handle handlers sexp = \n  let (tag,query_sexp) = full_query_of_sexp sexp in \n  let handler = List.find ~f:(fun x -&gt; x.tag = tag) handlers in \n  handler.handle query_sexp \nend \n\n\n\nUsing the RPC.t’s we started writing as part of the RPC_specs module, we can\nnow write the server as follows:\n\nlet handle_query conn = \n  let query = recv conn in \n  let resp = \n    Handler.handle [ Handler.implement RPC_specs.listdir listdir; \n            Handler.implement RPC_specs.read_file read_file; \n            Handler.implement RPC_specs.move move; \n            Handler.implement RPC_specs.put_file put_file; \n            Handler.implement RPC_specs.file_size file_size;] \n    query \n  in \n  send conn resp \n\n\n\nAnd we can implement the client side just as easily.\n\nlet query rpc conn x = \n  let query_sexp = rpc.RPC.query.inj x in \n  send (sexp_of_full_query (rpc.RPC.tag,query_sexp)); \n  rpc.RPC.resp.prj (recv conn)\n\nmodule Client : sig \n  val listdir : path -&gt; string list result \n  val read_file : path -&gt; string result \n  val move : path * path -&gt; unit result \n  val put_file : path * string -&gt; unit result \n  val file_size : path -&gt; int result \n  val file_exists : path -&gt; bool \nend\n = \nstruct \n  let listdir = query RPC_specs.listdir \n  let read_file = query RPC_specs.read_file \n  let move = query RPC_specs.move \n  let put_file = query RPC_specs.put_file \n  let file_size = query RPC_specs.file_size \n  let file_exists = query RPC_specs.file_exists \nend \n\n\n\nPleasantly, the signature of the Client module is exactly the same as the\nsignature of the underlying functions we’re exposing via RPC.\n\nTo be clear, this is far from a complete implementation – particularly notable\nis the weak error handling, and we haven’t said anything about how to deal with\nversioning of the protocol. But even though the implementation we’ve sketched\nout is a toy, we think this approach scales well to a full implementation.\n\nThere are still some problems. Although we’ve added static checks for some\nerrors, we’ve eliminated some others. For instance, it’s now possible for the\nuser to specify multiple RPC.t’s with the same tag, and there’s no guarantee\nthat the server has exhaustively implemented all of expected RPC.t’s. I’m not\naware of a clean way of getting all of these static checks working cleanly\ntogether in the same implementation.\n",
        "url"      : "https://blog.janestreet.com/typing-rpcs/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Using let module for matching",
        "date"     : "March 14, 2008",
        "authorId" : "sweeks",
        "author"   : "Stephen Weeks",
        "tags"     : [],
        "minsToRead" : 2,
        "content"  : "In OCaml, referring to constructors defined in other modules can be somewhat\nawkward. Suppose we have a module like the following.\n\nmodule Example = struct\n  type t = Foo | Bar | Baz\nend\n\n\n\nTo write a function that pattern matches on values of type Example.t we could\ndirectly refer to the variants as follows.\n\nlet f e =\n  match e with\n  | Example.Foo -&gt; ...\n  | Example.Bar -&gt; ...\n  | Example.Baz -&gt; ...\n\n\n\nThat is pretty verbose. We could alleviate the problem by opening Example.\n\nopen Example\nlet f e = match e with\n  | Foo -&gt; ...\n  | Bar -&gt; ...\n  | Baz -&gt; ...\n\n\n\nThat is nicer to look at, but the open potentially brings a lot of things into\nscope (and not just for f, but for the rest of the file). Using open is\ngenerally bad style because it makes it hard for a reader to connect definitions\nwith uses. The open would be less problematic if we could reduce its scope. We\ncan do that by using a local module.\n\nlet f e =\n  let module M = struct\n    open Example\n    let res =\n    match e with\n    | Foo -&gt; ...\n    | Bar -&gt; ...\n    | Baz -&gt; ...\n  end in\n  M.res\n\n\n\nThat’s pretty verbose too. The approach we’ve settled on at Jane Street is to\nuse let module to rebind the module to a short name, thereby making the code\nconcise and avoiding the open entirely.\n\nlet f e =\n  let module E = Example in\n  match e with\n  | E.Foo -&gt; ...\n  | E.Bar -&gt; ...\n  | E.Baz -&gt; ...\n\n\n",
        "url"      : "https://blog.janestreet.com/using-let-module-for-matching/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Extracting an exception from a module",
        "date"     : "March 11, 2008",
        "authorId" : "sweeks",
        "author"   : "Stephen Weeks",
        "tags"     : [],
        "minsToRead" : 1,
        "content"  : "The Unix module defines the Unix_error exception constructor.\n\nmodule Unix : sig\n  exception Unix_error of error * string * string\n  ...\nend\n\n\n\nSuppose you want to create your own My_unix module that defines some Unix\nutility functions and exports the same Unix_error. How would you do it?\n You can’t redeclare\n\nUnix_error, since that would make a new constructor, which won’t match\nUnix.Unix_error.\n\nmodule My_unix = struct\n  exception Unix_error of error * string * string (* a new exception *)\n  ... my unix functions ...\nend\n\n\n\nYou could include the whole Unix module, but that pollutes the namespace of\nMy_unix unnecessarily.\n\nmodule My_unix = struct\n  include Unix\n\n  ... my unix functions ...\nend\n\n\n\nA trick to bring just the exception constructor you want into scope is to use a\nconstrained include of the form include (M : sig ... end).\n\nmodule My_unix = struct\n  include (Unix : sig exception Unix_error of Unix.error * string * string end)\n\n  ... my unix functions ...\nend\n\n\n\nThis does require duplicating the exception declaration in the signature, but\nthe type checker will of course guarantee that the declaration you write matches\nthe original, so there is no real chance for error.\n",
        "url"      : "https://blog.janestreet.com/extracting-an-exception-from-a-module/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "A universal type?",
        "date"     : "March 7, 2008",
        "authorId" : "sweeks",
        "author"   : "Stephen Weeks",
        "tags"     : [],
        "minsToRead" : 1,
        "content"  : "Is it possible in OCaml to implement a universal type, into which any other type\ncan be embedded? More concretely, is possible to implement the following\nsignature?\n\nmodule type Univ = sig\n  type t\n  val embed: unit -&gt; ('a -&gt; t) * (t -&gt; 'a option)\nend\n\n\n\nThe idea is that t is the universal type and that embed () returns a pair\n(inj, prj), which inject to and project from the universal type. Projection is\npartial (returns an option) because injection is not surjective.\n\nHere is an example of how to use `Univ’.\n\nmodule Test (U : Univ) = struct\n  let (of_int, to_int) = U.embed ()\n  let (of_string, to_string) = U.embed ()\n  let r : U.t ref = ref (of_int 13)\n  let () = begin\n    assert (to_int !r = Some 13);\n    assert (to_string !r = None);\n    r := of_string \"foo\";\n    assert (to_int !r = None);\n    assert (to_string !r = Some \"foo\");\n  end\nend\n\n\n\nTry it for yourself and see if you can implement module Univ : Univ so that\nTest (Univ) passes. No Obj.magic or other unsafe features allowed!\n",
        "url"      : "https://blog.janestreet.com/a-universal-type/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "Talking at Penn",
        "date"     : "March 5, 2008",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 1,
        "content"  : "I just got back from an enjoyable visit at Penn. I gave a version of my POPL\ntalk for an audience\nconsisting in large part of students taking Benjamin Pierce’s advanced\nprogramming\nclass, which is\nbeing done in Haskell with a little bit of ML. I also got a chance to chat with\nsome of the PL faculty and grad students and to hear what people are up to on\nthe research front.\n\nIt was a fun afternoon. I hope among other things that it stirs up some more\ninterest (and proposals) for this year’s OCaml Summer\nProject.\n\nI also spoke with Benjamin about the evolution of their intro programming\ncourse. A few years back they were teaching it in OCaml. Then, for all sorts of\nperfectly understandable reasons, they ended up moving the course to Java. This\ndespite the fact that Benjamin’s feeling was that the students ended up better\nprepared for thinking about Java when the intro course was focused more on\nOCaml.\n\nIt all makes sense, but it is still too bad to see one of the few places in the\nUS really teaching functional programming as an early part of the curriculum\ngive up on it. Maybe it will get resurrected at some point. That said, Benjamin\nalso pointed out that there are advantages to teaching ML or Haskell to an\nadvanced programming class, in that you get to hit the students with it when\nthey’re really ready to appreciate the power of the approach. It certainly seems\nlike they put together a good set of students this year.\n",
        "url"      : "https://blog.janestreet.com/talking-at-penn/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "We've got a blog!",
        "date"     : "February 28, 2008",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 0,
        "content"  : "Jane Street finally has a blog! Jane Street is one of the biggest commercial\nusers of OCaml, and we like to think that we’ve picked up a few tricks over the\nyears. In addition to putting down our random musings, we plan to use this space\nto share what we’ve learned. We hope that over time this turns into a useful\nresource for the larger OCaml community.\n\nWe’ll get our first real post out soon.\n",
        "url"      : "https://blog.janestreet.com/weve-got-a-blog/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    },
    
    
    
    {
        "title"    : "HOWTO: Static access control using phantom types",
        "date"     : "February 28, 2008",
        "authorId" : "yminsky",
        "author"   : "Yaron Minsky",
        "tags"     : [],
        "minsToRead" : 9,
        "content"  : "We thought that phantom types would be an appropriate topic for our first real\npost because they are a good example of a powerful and useful feature of OCaml\nthat is little used in practice.\n\nIn this post, I’ll cover a fairly simple use of phantom types: enforcing a\ncapability-style access-control policy. In particular, I’ll describe how you can\ncreate easy to use read-only handles to a mutable data structure. We’ll explore\nthis using the example of an int ref. The int ref is a toy example, but the\nsame approach can be used for more realistic cases, such as a string library or\na database interface.\n\nWe’ll start by implementing an int ref module on top of OCaml’s built-in ref.\n\nmodule Ref : sig\n  type t\n  val create : int -&gt; t\n  val set : t -&gt; int -&gt; unit\n  val get : t -&gt; int\nend\n =\nstruct\n  type t = int ref\n  let create x = ref x\n  let set t x = t := x\n  let get t = !t\nend\n\n\n\nThe simplest way of getting a read-only handle is to create another module with\na different, more constrained signature.\n\nmodule RORef : sig\n  type t\n  val import : Ref.t -&gt; t\n  val get : t-&gt; int\nend\n =\nstruct\n  type t = Ref.t\n  let import x = x\n  let get = Ref.get\nend\n\n\n\nAn RORef.t is just a Ref.t underneath, but the signature hides that fact by\nmaking the RORef.t abstract. Note that there is a function for converting\nRef.t’s to RORef.t’s (import), but not the other way around. This gives\nyou a way to create the read-only handle, but prevents someone with such a\nhandle from recovering the underlying read-write handle. The downside to this\napproach is that it is impossible to write code that is polymorphic over\nRef.t’s and RORef.t’s, even if that code only uses the functionality common\nto both, i.e., if it only reads.\n\nA better solution is to use a phantom type to encode the access control rights\nassociated with a particular value. But what is a phantom type? The definition\nunfortunately makes it sound more complicated than it is. A phantom type is a\ntype that is used as a parameter to another type (like the int in int list),\nbut which is unused in the actual definition (as in type 'a t = int). The fact\nthat the phantom parameter is unused gives you the freedom to use it to encode\nadditional information about your types, which you can then convince the type\nchecker to keep track of for you. Since the phantom type isn’t really part of\nthe definition of the type, it has no effect on code-generation and so is\ncompletely free at runtime. The way in which you convince the type-checker to\ntrack the information you’re interested in is by constraining the appearance of\nthe phantom types using a signature.\n\nIt’s easier to understand once you look at an example.\n\ntype readonly\ntype readwrite\n\nmodule PRef : sig\n  type 'a t\n  val create : int -&gt; readwrite t\n  val set : readwrite t -&gt; int -&gt; unit\n  val get : 'a t -&gt; int\n  val readonly : 'a t -&gt; readonly t\nend\n =\nstruct\n  type 'a t = Ref.t\n  let create = Ref.create\n  let set = Ref.set\n  let get = Ref.get\n  let readonly x = x\nend\n\n\n\nIn the above code, the phantom type tells you what your permissions are. A\nreadwrite PRef.t can read and write, and a readonly PRef.t can only read.\nNote that the get function doesn’t pay any attention to the phantom type,\nwhich is why get can be used with both readwrite and readonly PRef.t’s.\nThe only function that can modify a ref is set, and that requires a\nreadwrite PRef.t.\n\nNote that the types readonly and readwrite have no definitions. They look\nlike the declaration of an abstract type, except that these definitions are not\nin a signature. They’re actually examples of uninhabited types, i.e., types\nwithout associated values. The lack of values presents no problems here, since\nwe’re using the types only as tags.\n\nThe great thing about this approach is how seamlessly it works in practice. The\nuser of the library can write things in a natural style, and the type system\npropagates the access-control constraints as you would expect. For example, the\nfollowing definitions\n\nlet sumrefs reflist =\n  List.fold_left (+) 0 (List.map PRef.get reflist)\n\nlet increfs reflist =\n  List.iter (fun r -&gt; PRef.set r (PRef.get r + 1)) reflist\n\n\n\nwill be given the following inferred types\n\nval sumrefs : 'a PRef.t list -&gt; int\nval increfs : readwrite PRef.t list -&gt; unit\n\n\n\nIn other words, the first function, which only reads, can operate on any kind of\nref, and the second, which mutates the refs, requires a readwrite ref.\n\nThere is one problem with the access control policy we implemented above, which\nis that there is no clean way of guaranteeing that a given value is immutable.\nIn particular, even if a given value is readonly, it doesn’t preclude the\nexistence of another readwrite handle to the same object somewhere else in the\nprogram. (Obviously, immutable int refs are not a particularly compelling\napplication, but having both mutable and immutable versions makes sense for more\ncomplicated data types, such as string or arrays).\n\nBut we can get immutable values as well by making the phantom types just\nslightly more complicated.\n\ntype readonly\ntype readwrite\ntype immutable\n\nmodule IRef : sig\n  type 'a t\n  val create_immutable : int -&gt; immutable t\n  val create_readwrite : int -&gt; readwrite t\n  val readonly : 'a t -&gt; readonly t\n  val set : readwrite t -&gt; int -&gt; unit\n  val get : 'a t -&gt; int\nend\n =\nstruct\n  type 'a t = Ref.t\n  let create_immutable = Ref.create\n  let create_readwrite = Ref.create\n  let readonly x = x\n  let set = Ref.set\n  let get = Ref.get\nend\n\n\n\nImportantly, there’s no way for an IRef.t to become immutable. It must be\nimmutable from birth.\n\nExtra credit: Making it more polymorphic\n\nOne thing that’s notable about the IRef signature is that there is no way of\ncreating an actual polymorphic IRef.t. The two creation functions both create\nvalues with specific tags, immutable or readwrite. These specialized create\nfunctions aren’t strictly necessary, though. We could have instead written\nIRef with the following signature.\n\nsig\n  type 'a t\n  val create : int -&gt; 'a t\n  val set : readwrite t -&gt; int -&gt; unit\n  val get : 'a t -&gt; int\n  val readonly : 'a t -&gt; readonly t\nend\n\n\n\nThe user can force the creation of an immutable or readwrite Ref by adding a\nconstraint. So, you could get the effect of\n\nlet r = IRef.create_immutable 3\n\n\n\nby instead writing\n\nlet r = (IRef.create 3 : immutable IRef.t)\n\n\n\nThe advantage of the polymorphic create function is straightforward: it allows\nyou to write functions that are more polymorphic, and therefore more flexible.\nFor instance, you could write a single function that could create, depending on\ncontext, an array of readwrite refs, an array of readonly refs, or an array of\nimmutable refs.\n\nThe downside is that it may require more type annotations when you do want to be\nexplicit about the permissions, and it also allows some weird types to come up.\nIn particular, you can create an IRef.t with any phantom parameter you want!\nNothing stops you from creating a string IRef.t, even though string doesn’t\nmake any sense as an access-control tag. Interestingly, the signature doesn’t\nactually make any reference to the immutable type, and in fact, using any\nphantom parameter other than readonly and readwrite makes the ref immutable.\nThe access control restrictions still work in roughly the way you would expect,\nbut it is still a bit harder to think about than the original signature.\n",
        "url"      : "https://blog.janestreet.com/howto-static-access-control-using-phantom-types/",
        "image"    : null,
        "topic"    : ["technology"],
        "type"     : "blog-post",
        "source"   : "Tech Blog"
    }
    
]