Jane Street
Tech Talks

April 15, 2017

Seven Implementations of Incremental

Yaron Minsky
Jane Street

A talk about the history of Incremental, a library that makes it easier to build efficient on-line algorithms. This is based on the idea of self-adjusting computations, introduced by Umut Acar et al. This talk describes the work required to take these ideas from their form in the academic literature to a useful tool in a practical setting.

Yaron Minsky

Yaron Minsky joined Jane Street back in 2002, and claims the dubious honor of having convinced the firm to start using OCaml. He also spends way too much time teaching his kids how to program.

Transcript

So, I want to give you guys a talk about a library that we use internally here, called Incremental. Only some of you have actually used Incremental, so I’ll talk a little bit about what Incremental is, and how it works, and what problems it’s trying to solve. But most of the talk is really going to focus on the long and winding road from the initial implementation of Incremental that we started with, and the one that we’ve ended up with now.

It’s sort of an interesting story, because it’s lots of different concerns got mixed up there. The ideas behind Incremental come out of academic work on functional programming, and algorithms, and stuff like that. There’s a bunch of stuff that came from those papers that we screwed up, and that the academics got right in our initial versions. There’s a bunch of things that we got right, that the academics didn’t pay any attention to.

The several year long history of this project is an interesting back and forth, between what you get from the standard academic approach to looking at these problems, and what you get when you’re trying to use it for real work, which is the experience that we have here.

Okay. So, let me say a little bit about what Incremental is, what it’s useful for. The basic goal of incremental is to allow you to build big, complicated computations. Computations that depend on lots of data, that are efficient, and are specifically efficient in the sense that, when only small amounts of your input data change, you want to be able to efficiently refresh that computation, and get a clean version of the output out of the end of it. That’s the fundamental goal of Incremental.

It’s, in many ways, similar to the computational model that you expect from a spreadsheet. We don’t normally think of spreadsheets as programming languages, but they kind of are. They’re different from the programming languages we’re all used to. It’s interestingly a functional programming language. After all, what do you put into the cells of a spreadsheet, other than simple expressions that are just a pure computation based on some other set of inputs?

The spreadsheet world is kind of inverted what we think of from ordinary programming, in ordinary programming code is king. That’s what you look at. Then there’s some data that operates on in the background, and that’s not so visible. In spreadsheets, it’s the reverse. The data is the primary thing you see, and then the logic of the program is kind of hidden in those cells that are a little harder to look at.

But separating from that part of it, the other interesting thing about spreadsheets is that they give you a way of expressing a computation that structured, essentially, as a dependency graph. Every equation that’s in a cell has some other set of inputs from which its value is derived.

Excel is very sensitive to this structure and tries to be efficient, so that if you have some really big, complicated spreadsheets, and we have really big and complicated spreadsheets at Jane Street; Excel does a lot of work to update those efficiently. So, if some of it changes, some of the cells are modified, then Excel only re-fires the part of the dependency graph that needs to be re-fired.

So, this turns out to be a very effective programming paradigm, both because it’s a nice way of building these efficient algorithms, without a lot of work to do so. The other thing that’s good about it is, it also gives you a natural way to build visualizations of what’s going on. One of the great things about spreadsheets is, you can create a little monitoring sheet, which has references to various intermediate parts of your computations, and displays them in one place.

This is kind of easy, and natural, and efficient to do, because the intermediate parts of your computations are rarefied. They have an explicit place in the world that you can refer to. That makes all sorts of other things that you want to do later easier. Incremental’s going to give us a version of that as well, but within the bounds of OCaml.

So, I guess what I want to start doing is, start laying out in a little bit more detail, what really precisely is the computational model that Incremental gives you. I’m going to do that by starting, by bringing up the OCaml interface, that looks something like the interface of the first version of Incremental that we built.

Let’s make that fit the screen. So, obviously, it’s easier to understand this if you are comfortable looking at OCaml signatures. But I’m going to go through piece by piece and try to unpack what the meaning of the different parts are.

So, you can ignore the module type S at the beginning and end. That’s just a wrapper to give a name to the signature. But there’s this main module, Incr. That’s the Incremental itself. It’s a parametrized type. You can have an Incr.T that contains an integer, or a floating point value, or a string, or an array of strings, or whatever type of structure type you want can sit inside that tick A of the Incremental.

Then we have a bunch of basic operators that you can use for constructing new incrementals, based on the incrementals you already have in your hand. So, the simplest one is return. Return does, in some sense, the minimal thing you could imagine doing, which is, return takes an arbitrary value and gives you back an Incremental computation that is always equal to that value. Another natural name for this function is const, return to the constant computation.

Map is another interesting one. Map takes an incremental of some type A, and a function from A to B, and gives you a new incremental, whose contents is of type B. So, I’m going to start drawing pictures on the wall. So, you can imagine that you start with something that has As in it. You call Map, and it gives you a new one, which has Bs in it. There’s some function, F, that takes you from A to B.

If you want to think about what is the living, breathing graph of this computation, we’re going to start seeing those appear on the wall. So here, you imagine every time this A changes, it’s going to cause this function F to re-fire, to recompute the B.

So far, so good? All right. So, this already gives us a little bit of expressive power. We can now start taking this map, and start growing the computations by doing lots of maps at different points in the hierarchy.

It’s worth noting, however, that there’s something quite limiting about this. We can add new nodes and branch out, but we have no way, yet, of pulling back in. So, that’s not the most exciting kind of computational graph, if for nothing else, like if you all start this with one thing, and then this thing fires, everything’s going to change. It’s kind of boring that way.

But we can make it a little bit more interesting by adding functions that bring things back together. So, map 2 is the first one. The signature for map 2 is a little harder to parse, but map 2 takes two incrementals, an A incremental and a B incremental, and then a function that takes an A and a B, and returns a C, and it gives you back a C incremental.

Pretty straightforward. We can use that to start bringing things back together. So, we can make lots of interesting sub-computations, and then we can merge them together into something, if we want.

In fact, once you have map 2, you effectively have map 3, and map 4, and map 5, by just combining multiple map 2s together. So, map gave us a real new bit of expressivity, and map 2 gave us another one. From there, there’s a bunch of other things you can build on top of it.

The world that you’re in now, when you just have these operators, is essentially the world where you can build more or less arbitrary, static dependency graphs, directed acyclic graphs. You can add more nodes, and keep on growing them, and branch out, and come back in. You can build basically an arbitrary dag. But the dag is in some sense static. It’s not totally static in the sense that, here, I can call these functions and add new nodes. But it is static with respect to the input data.

So, if I have this input here, and I change it, no matter what, it will never change the structure of the dependency graph. In fact, I can have multiple different inputs. I can have lots of inputs that feed into this graph. That doesn’t really change anything. It’s still static in the sense that we described before.

But we can make it dynamic. That’s what the next operator does, bind. Bind is actually subtly different from map. It’s easy to not even notice the difference, but it turns out the difference really matters a lot. So, the signature of bind is, it takes an A incremental, and a function that takes an A and returns a B incremental. Then it returns to you all in a new B incremental.

So, it’s like the only difference between bind and map is the extra T that shows up right here. But otherwise, they are exactly the same. But that small, one letter difference it turns out, has a lot of expressive power. The thing it’s going to give us is dynamism, the ability to change the computations as we go.

So, the way you can visualize a bind node, a bind node has a left hand side and then a right hand side. The idea is the left hand side of the bind node is the first input, sometimes the input to the bind. Then the right hand side is going to be what’s produced by this function here.

So, the idea is, when you create a bind node, it has an input. That’s this left hand side. Then, depending on what this is, it can pick any of a number of different nodes to wire in as the right hand side. It can change its mind as it goes because it can choose different incrementals, depending on what the input is. In fact, as we’ll see, you can in fact even create new incrementals in that context.

So, this is all a little abstract. Let me make this difference a little bit more concrete by looking at how you might implement if. So, this is a tiny piece of code using the Incremental interface I just showed you, which implements if in two different ways; once with map, and once with bind. As we’ll see, even though they produce the same answer, they have different graphs, and therefore, different performance behaviors.

So, this is nonsense. This is just me implementing map 3 on top of map 2. I told you I could do it before. That’s actually the code to do it. It’s not super interesting, but it just makes the rest of this work. Then my first implementation of if is going to use map, particularly this map 3.

So, I’m going to call map 3 on, a variable representing the condition, like the boolean, true or false, the then clause and the else clause. Then what am I going to do with the condition, the then, and the else? I’m going to say, “If the condition is true, then return T. Else return E.”

So, one thing that’s important but kind of subtle is, what’s the type of C here?

Bool incremental.

Right. So, bool incremental. What’s the type of C here or here? It’s just a bool. That’s right.

Even though we’re using, we’re somewhat abusing the terminology here, and we’re using the same variables in different places, it’s the kind of unwrapped version that we can then use for a straight up computation.

So, the dependency graph here is actually really simple. All we do is, we basically have a node with three inputs; the condition, the then, and the else. Anytime any of them changes, we recompute. We say, “If it’s true, return T. Else, return E.”

The only part of this that’s problematic is the performance behavior. Which is to say, let’s say this condition changes rarely. Let’s say it’s currently true. Then in some sense, once you know this, you know that you don’t actually care about what’s going on here. You don’t really need to re-fire an if statement when this changes, only when this changes. But the map if implementation is not aware of that, so it recomputed every, single time, when either branch changes, no matter what the condition is.

So, the bind implementation here has the other behavior. So, for bind if, we bind on C, on the condition and then, if it’s true, then we return T. Otherwise, we return E. But in this case, T and E are incrementals. So, what bind is doing in this case is choosing between two possibilities. So, in the bind case, instead of having all three wired in, we have either the true case wired in, or we have the else case. Either than, or the else case wired in, but never both.

So, that maybe gives you a little bit more of a sense between a map and a bind. Do people have any questions about this? It’s easy to get confused. It’s like an interesting and quite different library.

Okay. So, I guess the other thing I want to say about bind is, this use of bind in some ways undersells how important bind is. Because it makes it sound like just a small optimization about the firing structure of the resulting graph. But it’s actually a much bigger deal than that, because in addition to changing where you’re pointing to amongst two existing incrementals, you can also do all sorts of new things. You can construct new incrementals by calling map and bind inside of the right hand side of a bind.

So for example, you could have some computation where there’s an infinite number of possible configurations, and you bind on some description of the configuration you want now. The right hand side constructs the computation for you. For example, there might be some product you want to do, of a bunch of inputs. Which ones are they? That might come in on the right hand side of the bind. Then you construct the whole sub-graph that does that computation for you.

So, bind really does a lot to add to the expressive power of the system, letting you do pretty general purpose dynamism in the middle of your incremental computation.

C fires a lot. Do you prefer map 3?

Yeah, this is a good point.

The whole structure of incrementals is what’s called a monad. It’s a monadically structured library. There’s a standard trade off between map and bind in libraries like this, which is, map is generally speaking, simpler to use, simpler to understand, and simpler to implement.

Bind is more powerful, but worse in all the other ways. That’s true here. Bind certainly adds a lot of complexity to the implementation of Incremental, as we’ll see in a second. It’s also slower, because it involves, in many cases, the allocation of new nodes and more heavyweight operations.

So, yeah. It’s worth saying. These are all baby cases, and you can’t really get a good understanding of why these things are useful by just looking at baby cases. So, for example, here’s one optimization to this, don’t do it incrementally at all. If statement is a tiny amount of work and incrementalizing it isn’t all that valuable, and the difference between map and bind doesn’t matter very much.

In the very small, this is kind of in some sense, not very interesting. But when you get to bigger and more complicated things is where it matters. And you have to worry, when you think about bind, about exactly this problem of how expensive it is.

In fact, a lot of the practical experience of using Incremental is, you have to look at your program in two ways one is, what is the thing it’s trying to do. That’s, in some sense, easy. You just drop all the bind and maps. Just forget about every incremental operation and see what it does as an all at once program. That’s its semantics. That’s its meaning.

But when you want to think about performance, you have to think about exactly where the incrementality lies. You have to think quite hard about the difference between bind and map because it makes a real difference.

All right. So far so good? All right.

So, we talked about the basic interface a little bit. Let me just remind and mention a few other points. So as I said, the basic operations; return, map, and map 2; those only let you build static dags. Bind gives you dynamism. An important part of this is that we’re really thinking about this as a way of optimizing a functional program, something that’s just computing and not doing side effects. That makes everything easier to think about.

Sometimes, you want to do side effects in these, and one can talk about that, but I’m mostly going to ignore that for the purpose of this talk. One other thing that I didn’t mention is this notion of cut off. Which, if you have a big, complicated graph, a lot of the time, you might want to stop the computation earlier than you would imagine if you just see what changed, and then fire the transitive closure, everything that is reachable from the thing you modified. Because sometimes, even though something changed, the output three stages down might not change.

Imagine, for example, you have some node that computes the max of two numbers. If the smaller number changed, it’s going to stop right there. You really want some notion of cutting off a computation when you detect that the input hasn’t changed, or in some cases, hasn’t changed by enough. If you’re computing real valued things, or floating point numbers, you might want to have some tolerance where you say, “Less than 10 to the negative 15th, I don’t care about differences that small.” You may want to cut off a computation in some ways. So, you really need support for cut offs.

One other thing. There’s a couple other parts of the interface I didn’t mention that are important. Everything I’ve talked about, I stopped at bind. So, everything I talked about was about constructing these computations, assuming you have some incrementals to being with. But I didn’t tell you how to use the computation. I didn’t tell you how to get incrementals to begin with, so let’s talk about that.

So, stabilize is a somewhat mysterious looking function. It takes unit and returns unit. What does it do? So, what it actually does is, it runs the whole computation. So, the work flow with an incremental computation is, you build the basic computation. You modify some of the inputs. Then you call stabilize to flush that through the graph, and then we get the outputs back out.

How do we get the outputs out? We have this function, value, which will take an incremental and spit back the value at you. There’s another function which is very useful, called on update, which registers a call back. It tells you when something has changed. This goes back to the use case I was talking about before, where you might have some big computation where you want to put your ginger in multiple spots, so you can display things from it, or otherwise, in other ways react to it. On update lets you register your interest in some value and wait for it to come back to you.

Now we have everything, except we don’t know how to put data into the system yet. That’s a variable, so that’s the last piece of the interface. So, a variable is again, a parametrized type. You can create one with an initial value. You can set it, meaning you can change the value. Then you can call read. What read does is, it gives you back the incremental corresponding to that value.

So, you start building an incremental computation by creating a bunch of variables, reading off the incrementals from those variables, and then using the rest of the combinators that we have for building up the computation itself.

All right. Any questions about this interface? If you don’t understand this interface, you’re hosed for the rest of the talk.

Can you end up with cycles, or is there something [crosstalk 00:19:45]?

Ah, yes. So, the rough story is this. If you do everything in a beautiful, functional style, which we often don’t, then you’re basically guaranteed to not have cycles. If you do some clever things to optimize, like you try and memorize computations that you do a lot so that you don’t have to rebuild them, then it’s easy to come up with cycles, and you have to figure out what to do about that.

That’s actually a serious issues. We’ll talk about how it deals with cycles, interesting and different between different implementations that we went through. Okay. All right.

Let’s get back to my high quality slides. So, I’m going to go through a bunch of different versions; some in more, some in less depth. I’m probably going to talk about this one in the most depth, because it’s the easiest to understand. This is the first version that we did, that Steven Weeks and [inaudible 00:20:36] wrote, when years ago, at this point, we decided we wanted a version of this.

So, the first thing I want to talk about is, what is the stabilization function, because that’s the key thing that we have. It’s the way in which you propagate all the information. So, this computation graph I’ve been describing and drawing on the board, you more or less make that real in OCaml objects. Which you know, you allocate them, and they have pointers to each other in a way that roughly reflects the structure of this dependency graph.

One important thing to understand about it is, you have to have pointers going from the variables, up. When you represent a graph like this, it’s not obvious which direction you need the pointers. But if you think about the performance characteristics you want out of this, for the work that we’re doing, we want to be able to go up. Why is that? Because the new information comes in at the leaves. You want to do the minimal amount of work that you have to do, based on which of the leaves have changed.

So, the basic structure of how any algorithm to do this has to work is, some of your leaves are dirty because they’ve been set. You go through and do the propagation starting from there, and stop when you don’t need to go any further. That requires that, from the leaves, you know where they connected. So, you need these upward pointers.

The algorithm is trickier than you might imagine, if you haven’t thought about it at all, because of what happens when you have, as you often do, recombinant graphs. Meaning the graph fans out, and then comes back in. So, let’s draw a simple example of a recombinant graph.

So, you have to think a little bit hard about how you’re going to fire this structure, because let’s say we do a simple depth first search. We go through and say, “This has changed, so this has to be recomputed, so this has to be recomputed, so that has to be recomputed. Okay, we’re done with that. Okay, since this changed, this also has to be recomputed. Oh, wait. Now we have to recompute this one twice.”

So, you think, “Well, maybe it’s not so bad. You have to do something twice. How big of a deal is that?” But once you get twice, you basically have exponential badness, because you can take examples like this and stack them on top of each other. At each level of the stack, one update becomes two, becomes four, becomes eight, becomes 16.

So, you get an exponential blowup of unnecessary work if you just do a completely naïve walk of the tree. You might think, “Maybe I should do breadth first search, so I get to this thing at the same time.” But I tricked you already, because I made two links here, and only one here. So now you won’t reach it at the same time. So, you need actually something more principled then just changing the order of the search.

So, what’s the solution here? The solution here is to do it in two passes. Pass one is, you figure out what needs to be recomputed. You basically walk through the graph and imagine, there might be other parts of this computation that don’t depend on this node, and don’t need to be re-fired. In fact, you can sort of think about this world here, like this and all of this needs to be redone, but none of these nodes do.

So, one thing you can do is, just do a first pass where you mark the whole transit of closure of this node is dirty. Then when you’re done, you then do the actual propagation of the firing. The key trick here is, when you mark nodes as dirty, you also let them know how many of their inputs are dirty. Then they don’t recompute until the number of dirty inputs they have drops to zero.

So, here if you do the same thing we did before, where we’d walk up here. Then we’d tell this guy that one of his inputs was clean, but it wouldn’t fire yet, because it knows it has one more. It has to still wait. Then we go back and start going this way. Then finally, when we get here, then this guy knows he needs to fire, and then he recomputes and causes the next level up to recompute.

So, that’s the basic algorithm. It’s very simple. It’s quite fast. It’s easy to understand. Everything’s great. Well, not everything’s great, it turns out.

The first problem we have has to do with garbage collection. It turns out lots of pretty, functional programming abstractions that are built, are built on top of horrible, imperative, complicated, messy underneaths that are hard to understand. One of the big reasons they’re hard to understand is how they react with the garbage collector.

So, the particular problem here has to do with the direction of the pointers. We have pointers up from leaves, which means as long as the leaves are alive, everything they point to will remain alive. You might think, “Okay, that seems like a reasonable property,” but you don’t want to keep things alive because they’re connected to inputs. You want to keep these alive because someone is paying attention to them on the output side.

In particular, if you have in the middle of this graph, something dynamic, where there’s a bind node that is allocating new nodes, and dropping references to nodes it created previously, those nodes it created previously are still hooked in up from the leaves. So, if you don’t do anything clever at all, they’re never actually ever going to get collected.

So, you have to do something to take things that are no longer observable in the system, and drop all the pointers to them from the inputs, which have to remain alive because somebody’s feeding data in from the outside into those things, so those are going to be kept alive.

So, we need some way of figuring out what’s going on. The way we do this is by maintaining reference counts from externally held nodes. So, we’re going to keep track of, whenever someone has a node that they hold to on the outside, we’ll do a reference count of the number of times you are referenced from some externally held node. If that reference count drops to zero, then we’re going to cause you to drop, to cut yourself off from the graph so you can be collected.

So, that’s relatively simple. The only issue is, how do you keep track of whether you have external references. Because after all, none of this stuff is ever going to get collected, so the garbage collector isn’t going to tell you anything. So, how do you figure it out? It’s pretty easy. The basic idea is, instead of handing back these actual nodes that are in the internal graph back to outside users, you create an extra little node for each one of these, called a sentinel. The sentinel is just an extra cell that points to the real thing, and isn’t pointed to by anything else in the graph.

What you do is, you attach to every sentinel what’s called a finalizer. A finalizer is just a hook in the garbage collector that, right before it collects an object, it warns you. It says, “This thing’s about to be ripped out.” You use the finalization of the sentinels as a way to trigger the removal of the reference counts. Basically, the things you’re counting references from is the sentinels. So, it’s their arrival and removal that causes you to update the reference counts.

Okay. So far, so good? Does that make sense?

All right. So, everything’s great. It’s a wonderful implementation. There are no problems. Right, I told you about the sentinels already.

So, there’s one small problem with this design, I told you, which is, I talked before about how important being able to cut off the computation is. It’s impossible here, so that’s awkward. Why is it impossible? Because I have this two pass algorithm. The first pass marks what needs to be recomputed and the second pass recomputes it. Never did I say, some way of figuring out whether you needed to be recomputed.

The first pass, where you mark what needs to be recomputed is totally naïve. You can’t look at the values because they haven’t been computed yet. So, you just mark everything that’s in the transitive closure as needing a recomputation. So, if you have things that are important to cut off, there’s just no story here for how that’s going to work.

So, we came up with a story, which is a little bit of an awkward story. We said, “Well, probably, when you create a node that’s explicitly meant to be a cutoff node, we can probably make the assumption that almost always, it cuts off.” Then if we do that, what we can do is, we’re going to run this algorithm to its completion, under the assumption that all cutoffs don’t propagate.

But we’re going to keep a list of the cutoffs that re-fired. Then we’ll check. If any cutoff node changed by enough that the cutoff function says it really should re-fire, then we restart the algorithm again from there, with a round of cutoff nodes.

So, this sounds like it’ll work, and it does work, but it’s also bad, because if you have complicated nested cutoffs in your system, you can re-fire exponentially often. It has the same problem we had before, of the double fire. So, to our credit, we knew from the beginning that this was bad. So, we just used cutoffs in limited cases, at the edge of computations. More complicated internal stuff, we said, “Well, we’re just going to live without cutoffs.”

So, that was a big problem that this system had. It has another quite significant problem that took us longer to figure out. This has to do with our naïve idea about how the garbage collection of unnecessary bits of the graph is going to work. So remember, we talked about bind nodes can allocate new things, and the old things are no longer referred to, but are still hanging around and connected to the inputs until the garbage collector collects their sentinels, so we can update the reference counts and figure out that the things can be removed.

So, this sounds reasonable, but relying on the garbage collector to collect something, you shouldn’t really rely on it to collect it especially soon. The garbage collector collects things when it thinks it’s running out of memory. It could be, really, a long time before it wants to collect it. In the meantime, the thing that wasn’t obvious to us at first is, these nodes that are hanging around are still alive. They’re still kicking. It’s not just there’s extra memory. They’re hooked into the inputs. They’re still computing. They’re running. Every time inputs fire, they’re still firing.

So, that seems a little bad until you think about it more and realize it’s horrifically bad. Because if you have nested sets of binds, this is happening over, and over, and over again. At each level, you’re splitting yourself, and splitting yourself, and splitting yourself. So, you’re actually generating exponential amounts of garbage in the system.

You start with ah and then it’ll eventually be collected. But it turns out, between now and eventually, it’s going to allocate exponential amounts of stuff that you don’t need.

So, this is, in some sense, an obvious disaster in retrospect. We didn’t notice it at first because our very first applications for this did not use lots of nested binds. But when we started building user interfaces with this, it turned out nested binds were really useful. We noticed these weird, very long pauses in the middle, and didn’t know why. It turned out to be due to this.

All right. Any questions? Lots of exciting, confusing stuff going on, so feel free to not understand things. Yes?

How well does it correlate to what was in the paper, the outcomes?

We’ll get there. This implementation is quite different from the one in the paper. Indeed, when I gave the talk at CMU, the guys who were responsible for the usual paper looked a little horrified. I was like, “No, it’s the whole point of the talk! It gets better, I promise.” It really does get better.

But the results were, this was pretty fast. We know the cutoff behavior’s terrifying, and we know that things aren’t collected eagerly enough. But when used within a scope where the available semantics was good enough, it was a really effective piece of software. We actually were able to build systems that were very hard to build without it. It was all pretty efficient. We were all very excited about it, even though there were all these problems that we knew about.

We knew, however, there was a problem. Our first attempt to fix it, we knew about the cutoff problem, but didn’t totally understand this other problem about too much allocation. Our first attempt to fix this problem was to go back, in some sense, and read the papers more carefully, and understand what they did about this. It turned out they had a solution to this. The solution involves time.

So, the basic idea is that you want to have a topological sort of the graph when you’re going through and doing the firing. Because then there’s a simple algorithm for making sure everything is fired only once, that can be done in a way that’s sensitive to cutoffs. Because as you’re going, you can do the full computations, so you really know the values in your hand.

Just to be clear, a topological sort is a labeling of the graph with identifiers that are sortable, so you could have a total order of everything in the graph. Such that this ordering respects the dependency order. Which means you never go forward in the ordering and end up crossing a back edge. You’re always walking forward when you’re walking forward in numbers.

So, the way you do propagation, once you have a top sort, is really easy. You have a heap of things that need to be recomputed but haven’t been, yet. You start with your dirty nodes. You throw them on the heap. Then you start taking things out and propagating, and throwing all the things that get fired onto the heap. Because you’re always pulling it out in the right order, you’ll never visit a node until you’ve recomputed everything that, that node depends on.

So, it sort of guarantees to you that you never refire everything. But you can, at any point, look at a recomputed node and say, “This didn’t change by enough. I want to stop here.” That doesn’t disturb any of the invariants that you have.

So, top sorts are good. They make this kind of graph algorithm simpler. The only problem is, how do you get a top sort? So, there’s a bunch of ways of approaching this, but here’s a really, very naïve, simple way of thinking about a topological sort. Which is, one way we can do it is, we can solve it using a clock.

So, every time we allocate a new node, we’ll look at the clock on the wall and say, “What time is it?” We’ll put that time stamp on the node. There’s an ordering. As long as we’re only using the map, and map 2, and map 3 part of Incremental, it turns out everything works, because every new node only depends on old nodes. So, the time order respects the dependency order, so we’re happy.

So, where does it fall apart? It falls apart when your graphs are dynamic. If you’re just constructing the graph in one early sweep, the time stamp is enough, and in fact, you don’t need a clock. You just use a counter that you upgrade every time. So, that all works, but when you’re in the middle of reallocating nodes in the middle of your graph, it’s not so hot anymore.

So it’s the problems when you’re allocating nodes inside of a bind. That’s when you’re in trouble. So, the idea is to use a different kind of time stamp, a kind of logical time that will solve our problems. The idea is that, instead of picking, you can think of the old way of using a clock is, you look at the set of all of the time stamps that have been already used, and you find a new time stamp that’s bigger than all of those.

So, the alternate role is to say, “I’m going to get a new time stamp in a different way. I’m going to pick a time stamp that’s bigger than everything that is smaller than the bind under which I was allocated.” So, if you were a new guy allocated inside some bind, you always want to be behind that bind. So, you want to be a time stamp that’s before that bind and after everything else that’s before that bind.

That’s basically the rule. In some sense, if you have that rule, if you can allocate time stamps that way, everything almost just works. The only issue is, how do you allocate time stamps that way? If you have a bind node that’s here in the order and then there’s some other stuff behind it, you can allocate something in the middle, here. Then maybe you’re going to need to do something in the middle here.

These are all going to get really, really close together as you allocate more and more of them. So, you might think, “What data structure do I use for these? Maybe I could represent them as real numbers or something,” but that’s not so hot. They’re unbounded in length and the whole thing’s kind of a disaster.

So, it turns out there’s a bunch of clever data structures for doing this that have good asymptotic behavior. The simplest thing that we thought about, which I think works reasonably well is to just use, have your counters allocated in some big space, like just an integer space. When you pick ones, pick ones that are well spaced out to begin with. Then, when you have to go in between, you go in between. When you run out of space, you just flatten stuff out.

It’s almost like a garbage collection step. You wait until you have no space, and then you just redistribute.

So, then you de-compact?

Yeah. You just de-compact, where it’s the opposite of a garbage collector. It’s a negative garbage collector, or something. I don’t know. But it has the same flavor of, you’re building up a problem. You wait until it gets pretty big. Then you have a batch operation that clears it out.

There are other approaches of doing this, too, with clever binary trees and stuff.

[inaudible 00:37:19] go left, go left.

Yeah, yeah. You can use trees for it. It all depends on exactly the performance properties you want. The nice thing about the smooth out one is, it’s really simple to implement. The identifiers themselves are really cheap to compare. You do a lot more comparison than you do allocation. So, there’s a big win there for using int.

But anyway, there’s lots of different tricks, but that’s the basic idea.

So, we went ahead and implemented this. It was pretty easy. We were able to, an intern who didn’t have a ton of experience was able to go in and create a version of this. It worked better and worse. So, it was better in that the semantics made more sense. It was worse in that the performance was worse. It was a lot worse, one and a half to four times worse than the original implementation.

Okay. So, we weren’t super excited about that. That stayed as an intern project, and we did not roll it into production. Then we got to version 3. Version 3 is to solve the other problem that we have, and this is the version that we actually used for a long time, for our big gooey apps that use this.

Sorry. You had a question, Eric?

Sorry. So, you used version 3 for a while, or used version 2?

They’re numbered in different ways. In this talk, I call this one version 3, and I think this is the right time order of the versions.

Anyway, so this version is the one that’s trying to just solve the exponential garbage problem. This is just the problem I talked about before, which is, it takes a long time to collect everything. Those things that keep on running because they’re still connected to the variables are allocating lots of garbage. So, bind infested code gets totally screwed by our original implementation, so this is trying to fix it.

The basic idea is actually relatively simple. Sorry, you have a question?

Yeah. I’m curious. I don’t understand. These two things need to help one another in a certain sense? If you allocate a lot in the finalizer, then you would collect garbage more quickly, I would think.

So, that’s true, but isn’t really good enough. The point being, it’s still exponential. It can be going so fast that you have trouble keeping up with it.

You can argue about lots of things, but anytime you have an actual part of your algorithm which is exponential, it’s very easy for it to trip off into it not working at all.

So, we had a fairly simple approach, which is, we’re going to keep explicit track of what is the observed part of the graph. We do this by minting explicit values, called observers. So, just like the things that were kept alive are the transitive closure of the variables, the things that are needed is the transitive closure of the observers in the opposite direction.

We eagerly keep track of observation changes. So, when a bind node causes one node to, instead of looking at one incremental, looking at another, the one it stopped looking at immediately moves into being unobserved, and the new one immediately moves into being observed.

So, when you have a bind fire and it changes its focus from one area to another, you have this guarantee that only the current, real bit of the computation is actually observed. You have an extra invariant that you make sure that only the observed part of the computation actually churns. You quiesce the unobserved part. You tell it not to compute anymore.

That is basically enough to solve the problem, because the problem wasn’t that we had these things that were around for a little while. It’s that they were still alive and kicking while they were around. So, we kill them a little bit early, before the garbage collector actually gets to them, to stop them from causing trouble.

So, let’s actually look for a second at what the interface looks like with observers.

So, now we have three different components of the interface, instead of just two. There’s the incremental, which is lik it was before, except we’ve removed the part that lets you observe values from it. There’s the variables, which are exactly the same. Now there’s this new thing called an observer, which you can create from an incremental. You can extract values from, you can register on update functions for, and importantly, you can stop observing them.

So, this is the new hook. You see, in some sense, they’re the dual of the variable. They’re the opposite of the variable. They keep track of opposite parts of the graph, and the variable is something that you create from nothing, and then convert to an incremental. The observer is something you can create from an incremental, and then peel information out of.

This part of the interface really does preserve to the modern version of Incremental.

How is it computed if they’re not observed?

It’s essentially a reference count.

Sorry. Does it state that you have to explicitly stop observing things?

It does. Well, sort of. As we’ll see, when an observer is garbage collected, it is automatically unobserved. But just remember how I said before, it’s not a good idea, maybe, to let the garbage collector quiesce things that aren’t needed anymore. So, in practice, you should un-observe things when you don’t need them anymore. Which is not to say that it always happens.

Okay. So, this worked great. We used it a lot. We used it in a bunch of our internal UIs, and it mostly worked, although we recently replaced it and we’re very happy with that replacement.

So now, we’re on to version 4. Version 4 was like really going back and reading the Acar paper carefully. Umut Acar is the main guy behind the original papers in this area. So, this was not a useful implementation, exactly, but it was a good learning experience. We went back and integrated the V2 and V3 things into one implementation. We learned a few extra invariants from the papers, that we needed, about thinking about what depends on what.

We also learned some interesting things about the way in which the original implementation worked. One of the interesting things that struck us is that the original implementation from Acar did not distinguish how map nodes and bind nodes were constructed. In other words, you can implement map in terms of bind, because bind is a more powerful operator. But that turns out to be crushingly expensive, because it involves lots of extra allocation, because you end up allocating new nodes all the time when the graph is actually static, so you don’t really need to do allocation.

So, there are lots of, I think, practical performance problems in those original implementations that turned out not to be necessary, and that we didn’t have in ours, because we specialized map and map implementation to be more efficient.

So, v5 was a more practical attempt to combine v2 and v3. It also added one extra bit of functionality, which was interesting, which was, remember how I described this nice thing with time stamps, how we’re going to implement certain new time stamps before the bind and after everything else, and that’s going to make things work out okay?

So, it turns out, if you’re a good, functional programmer, and you only use good, functional programming primitives, everything works great. But if you aren’t, and you really shouldn’t be, it doesn’t work out so well. In particular, a thing that you very naturally want to do in this kind of context is, memorize computations.

If there’s some sub-graph that you computed, and then some other part of your computation wants to do it, you really want to share the same one. It’s a form of common sub expression elimination, applied to your computation. Meaning, there’s some common sub piece, and you want to actually share the computation from that sub piece, rather than repeating it.

So, the way you implement that is, you have some table somewhere, which keeps track of previous constructions of graphs, so that you can remember them and get them back when you want them again. But this can cause you to pull things out of order, not in the order that was originally intended by this time stamp system.

So, the end result is, you can, even when you’re doing what is the sensible incremental code, you can construct back edges. You asked about cycles before. What happens when you introduce a cycle with a back edge? Well, for the implementations up until here, what happened is, in the original implementation, you just had an infinite loop.

So, the way your program told you that you had introduced a back edge is, you said stabilize, and it never ended. Fail stop. At least it doesn’t send orders, or something terrible.

But that’s obviously not ideal. In the simple, Acar style time stamp thing, it’s just going to say, “Kaboom! You’ve put in an ill founded edge. I’m going to throw an exception and you’re dead.” It turned out in real programs, there were lots of these backward edges, so that was not so hot.

So, we added to this version a sometimes used dynamic top sort algorithm where it’d normally allocate these counters in the ordinary way. But if it saw a back edge, rather than blowing up, it would say, “I don’t know how to do a really good incremental top sort algorithm, but I can just redo the top sort in this case.” So, it’s a little “hackey.” You might worry that there are bad corner cases here, but it kind of worked.

So, this was semantically much better, but the performance was bad. Why was the performance bad? Well, remember I said there’s a heap, and you put the things in the heap, and you take them out? Well, heaps are kind of expensive data structures. The constant isn’t all that low, and they’re log end to remove an element.

So, it costs a fair amount to flow things through a heap. Even after we no longer had the thing an intern did in a few months. We had a brand new implementation that a bunch of good people had worked on for a long time, to try and make work well, but it still didn’t work fast enough to make us comfortable about replacing it.

All right. So, v6 was to eliminate the heap from the system. The core idea is to move a little bit away from the notion of using a total order. The key observation is, we actually never needed the total order. A top sort totally orders all the nodes, but is enough to have a partial order. What is a partial order? It just means there are lots of things that are different that are marked as equal in a partial order, or not comparable to each other. You know that some things are really different, and some things, you can’t tell.

It turns out that’s okay, as long as the partial order reflects the dependency structure well. It’s actually pretty easy to think of partial orders that do that. One example is, the height of a node in the graph. So, how do we define the height? You can simply define it recursively as, if you’re a leaf, it’s zero. Otherwise, it is the maximal height of all of your children plus one. Look at all your kids, see what the max height is, you’re one taller than that.

This also has the nice property that it respects the dependency order. By virtue of being a partial order, we have something with a very small number of valances, a small number of distinct values. So, you can imagine, you could have a graph with hundreds of thousands of nodes, but only 70 different heights.

So now, instead of having to have a heap that might have 100,000 nodes in it, you can just have an array of lists. You can have a very simple constant time low overhead data structure for keeping track of the things that need to be fired, because the valances is small enough you can stick it in your pocket. You can just have a little array that takes care of it.

This turned out to be a good idea. This idea was what we called a pseudo height. So, I said the height of a node is one plus the max of the height of the children. A pseudo height is just like that, except it’s not quite so painful to compute.

So, why is the height bad to compute? The height’s bad to compute because imagine you have a big graph. At the bottom of the graph, there’s a node with a bind, where it sometimes is height 2 and sometimes is height 1. Tick, tick, tick, tick. If that flips back and forth, you have to renumber everything back and forth. So, you’re suddenly taking a thing that should be constant, and making it time linear in the overall size of the graph. So, that’s kind of a disaster.

So, the pseudo height is almost like a height, except it’s memory sensitive, which is to say it never goes down. So your pseudo height is the maximum your pseudo height has ever been, basically. It starts out being the height, and it’s never willing to come down. That, in practice, has worked really well. It also has some weird corners. You can write some terrible data structures, terrible incremental computations, but basically have two nodes that change their mind as to who’s ahead of who. They keep on walking each other up the height graph, but you kind of never see it.

So, this is not a thing we warn people about, especially. You can construct it if you want. If you go back to your desk, you might think about how to do this, but it doesn’t seem to come up and real programs. We’d know, because we’d get errors when we exceed the pre-committed height.

So, this turns out to work really well in practice. This is, again, a property that our current system has.

So, what are results? The performance was now quite good. If you think about the overall performance story, here’s the picture. Imagine up is bad. It’s taking longer. We started off with something pretty good, and then we got something with better semantics, but worse behavior, and a little better, a little worse, hovering around here for several versions. Then finally, with this version, if this was our baseline, we finally brought it down to, maybe just a little bit worse than v1.

So, now we had the good behavior, and the performance was about as good as before. So, we were relatively happy with this. So, we had another nice trick which we wanted. This was in v7. We had kept this structure having finalizers on all of the versions, but it turns out we didn’t really need finalizers everywhere. The finalizers were there, remember, for the sentinels. The sentinels were there for the ref counts, figuring out when you disconnected things so that they could be collected, because they were no longer see-able anywhere.

So, we realized that we could basically just use the observers for this, with a little bit of extra hackery. In particular, we still need to make sure things are collectible when they’re totally unreachable. But we can do it in a somewhat subtler and cheaper way, which is, we just use the observability data to enforce the invariant that there are no pointers from the observed to the unobserved world.

So, if you see a connection to the observed to the unobserved world, which you can check every time your observability changes. You say, “I’m not allowed to have that pointer,” so I cut it off. If it changes in such a way that you need to reestablish the pointer, then you can reestablish it. It helps that you are on the, you have to be on the right side to reestablish it, but that is in fact the case. Because the guy who becomes observed knows who he depends on, and he tells that guy to reestablish the upward pointer.

This now means that the unobserved world is free floating. It’s no longer connected to the variables. So, if no one is using it, then it can actually be collected. So, we basically got to get rid of, it used to be if you had 300,000 nodes in your graph, you had 300,000 finalizers and no more. Now, you might have a big, 300,000 node graph where you have 10 output values, and then you only need 10 finalizers. Finalizers cost something from the collector, so there’s real value there.

Again, also in this version, we discovered some new semantic restraints that we hadn’t realized, having to do again with bind nodes. Bind nodes are really the complicated bit. The thing we realized in this version is that when a bind node fires and you create nodes, then it fires again and you create new nodes, those old nodes are kind of bad. It turns out, it’s not safe to depend on them. So, you have to make sure to obsolete, to kill off nodes and not allow dependencies on those.

So, that was a new thing we discovered in this version. But the real big win for this version was the finalizers. It was great. Now it was finally like we had gotten down to here, so that was pretty good. We were excited. We thought we were done. Then the guys who were working on this, Valentin Gatien-Baron and Nick Chapman, who were the main guys who were doing all these versions of Incremental after the first few, were done by these guys.

So, they handed it back to Steven Weeks for code review. If you’ve ever met Steven, calling it code review, it’s more like code rewriting. Because he wants to make it look like he wants to.

So, v7 was the final version of this process, but then it turns out, we ended up with a v8. Why did we end up with a v8? Well, between the time that we were working on all these versions and the time we did the final review, we had learned new things about how much allocation costs for these kinds of programs, and we’d gotten some new features in the language. In particular, we got what are called generalized algebraic data types. This gives you nice ways of encoding what one calls existentials.

Let me talk a little bit about this. If it sounds like we’re in the philosophical, it almost is.

The idea is basically this. The way that Incremental is initially implemented is kind of tricky in that, you have all these different nodes. Remember, there’s a parameter for these nodes. Something might be computing an Int, and something else might be computing a list of strings; different kinds of things in different parts. But they all have to sit in one data structure.

As you know, an OCaml list is parametrized in its genericity. It’s an alpha list. It might have integers or strings, but it can’t have a mixture of things. We kind of need to have a mixture of things. We need to be able to generally plug these guys in.

So, the way you do that is, you create what you might call a poor man’s object. You have a little record. Each record has a bunch of values that are functions that let you do things to the underlying object, but don’t expose the type of that object. Instead of directly storing the underlying object, it’s this record full of closures that you actually keep in your data structure.

So, this turns out to totally suck. It sucks for two distinct reasons. One reason it sucks is it’s slow. Closures are relatively cheap, but if you allocate a bunch of them, it can cost a lot in terms of memory. But it’s also bad as a programming experience. It’s like programming with oven mitts. You have to mint one unique oven mitt for every operation that you want to do. You kind of go in every time and do the exact thing that you can do, but you can’t see what’s happening underneath.

The usual programming style we’re used to in OCaml, where you have these variants, and records, and nested structures that tell you exactly what’s going on so you can really operate directly on it just aren’t there, when you work in this object oriented almost style.

The thing that lets you do what you want is a slightly richer thing in the type system. So, existentials basically let you talk about a thing whose type you don’t know, more naturally in terms of the operations that you have on top of it. Saying, “I don’t know what the type is, but there exists a type that it is, and I can operate on it.” So, that helps. Generalize algebraic data to give you a little bit more expressiveness, so you can do a little more in writing these things down as ordinary listed different possibilities.

So, it was a win in two ways. It was a performance win because we got rid of all these closures that we don’t really need. It was also a comprehensibility win. The code was much easier to understand. Because of that, it was a performance win again, because it turned out that there were optimizations that you could do, because you could see what was going on underneath. You knew the exact structure and names of the kinds of nodes you had. So, there were cases where you could do small optimizations and rewirings of the graph, because you knew what was going on. So, it had a bunch of different wins.

The result of this was, the performance did this. It was suddenly massively better. It was shocking. It was so much faster than end user applications. Serious, relatively well tuned end user applications were three times faster after this change than before. I don’t mean the library was three times faster. I don’t know what the model fly around the library itself is.

But the actual applications got three times faster. The thing that this teaches you is that these applications that you’ve made faster by adding Incremental are in some sense now totally bounded by Incremental. There’s almost nothing left.

So, if you think about it, you have a computation that’s too slow to do all at once. So, you make it much faster by incrementalizing it. But now, almost all the actual computing you’re doing is the incrementalization framework itself, rather than the real computation. So, the returns to improving the incrementalization framework are really high.

That’s what we saw when we actually rolled this out. It was a really massive difference. In fact, the thing that happened recently is, one of the trading system front ends that used Incremental was very slow. It was using v3 from this talk, and had been for a long time. We’d been nudging them about getting it fixed, and, “Eh, we’ve got a lot of things to do.” It was busy. Finally, they’re like, “Okay, okay, we’ll do it.” We moved it from v3 to v8, to the current production version, and it was astonishing. It was so much faster. People say, “This went from the thing that ruins my life to the fastest application I interact with.” So, that was an interesting and gratifying thing to see happen.

All right. So, that was a big deal. So, let’s talk about what there is left. So now we have this library. We use it a lot. We use it for both things that are compute engines, and also for user interfaces, and starting to think about using it even more for user interfaces. So, a few different things that we’ve thought about as potential future improvements, and some that are in flight.

One observation that we’ve made recently, in thinking about using Incremental as part of doing Java Script applications, of all things, because it turns out you can compile OCaml to Java Script and run it in a web browser, is that you could use Incremental to incrementalize the computation of what you need to show to the user.

This is a thing that we’ve thought about doing in a way that’s consistent with the virtual dom approach that’s become increasingly popular with libraries like React, which is a straight up Java Script library that uses this approach. The nice thing about virtual dom is, it’s a way of thinking about your computation of what people see as just a simple function, from data to the thing that’s viewed.

It turns out, optimizing simple functions is exactly what Incremental is good for. So, we’ve thought about using it in this context. One of the things that we’ve discovered was missing in Incremental the way we started with it is, Incremental is really good if you have your inputs and, boom, they’re smashed into a million pebbles, each one being an incremental. Now you’re building up from those pebbles the computation, it works really well.

But if you start with just a big data structure, like a big functional data structure that, a new version lands, and a new version lands, and a new version lands, it’s not clear how to bootstrap that into Incremental in a clean way. One of the things we’re playing around with is, adding some extra primitives to Incremental, that allow us to take an ordinary functional data structure that is changing as an incremental value, and efficiently extract information about individual sub-components, by taking advantage of the fact that a lot of these functional data structures can be diffed efficiently.

So, this uses, an important part of this is the map data structure, which has this function, symmetric dif, that lets you really efficiently rediscover what changes have been made between two maps, taking advantage of being able to cut off that computation on physical equality of the new map and the old map.

So, that turns out to be a productive and interesting little corner of Incremental, and maybe gives us the ability to take the ideas, and take these techniques of building incremental computations, and apply them somewhat more generally. So, that’s one idea.

Then another idea, which is somewhat older, and in fact comes from other research that has happened on previous systems that are kind of similar, like there’s a system called Father Time, which is a part of the kind of guys who work on scheme and racket. In that system, they had an optimization which is potentially attractive, called lowering, I think is the term they gave for it. Which, it’s almost a kind of in-lining optimization.

In Father Time, instead of saying, “You can choose when to use this incremental abstraction,” they just use it everywhere. It was baked into the fabric, every time there’s an open pren in some scheme program, there was another incremental there. That’s a disaster, because you have so many of them that you’re really swamped by the overhead of the incremental computation.

So what they did was, they basically used ways of choosing to merge together nodes, trying to optimize by saying, “It’s not really worth it to have these all broken down. We’ll have bigger nodes that do bigger pieces, and less incrementalization, but overall, better efficiency.”

That’s the thing you could imagine us doing as well. I think the promising direction is to do a kind of trace based optimization where you keep some statistics about how long things take. Then you can use those statistics for making decisions about how to collapse nodes.

So, this is a much less important optimization for us, because we already get to make explicit choices about where we do and don’t want to construct these nodes. So, programmers have a lot of direct control over the performance, but it would still be nice. It would allow you to throw in a lot of incrementality without thinking too hard about it, and then some optimizer could come back and claw back, do some performance for you.

Anyway, that basically covers the whole thing. I guess for me, I think there’s a few interesting things to learn from this. One is, it’s just interesting how far you might need to go from the thing you read in some paper, to something you’re really happy to use in production.

It also, I think, there’s lots of interesting ideas that you have to dig through to get to that really good implementation. It was one of these cases where we didn’t just say, “Make it better.” We really sat down and did a lot of experimentation. I think the end result of that experimentation is, we really understood the problem well and were able to get to an implementation that really delivered useful things for the actual practical work that we do on a day to day basis.

That, I think, is that. Any final questions?

All right, thank you.