Unboxed Types for OCaml

OCaml has a well-deserved reputation for compiling quickly. This is thanks in part to its uniform representation: all values have the same memory layout, so generic code does not need to be specialized and recompiled for each use.

This uniformity has a downside: some programs waste a lot of time converting back and forth to this uniform representation.

In this talk, I'll describe some work-in-progress to remove this overhead, by extending OCaml's type system to allow other memory layouts, giving the programmer more fine-grained control over how code is compiled.


Now that you’re all here and stuck in the room with me, I can finally reveal the actual title of this talk, which is Making OCaml Less Like Lisp And More Like C++.

[Audience: Close the door, quick.]

Yeah. To explain why I might want to consider doing such an awful, awful thing, I’m going to talk about performance for a while. This is entirely motivated by performance. The only reason to want to unbox things is to make things go faster. But to explain just how much faster this sort of stuff can make things go, I need to talk about some actual, concrete numbers, and if you haven’t spent a long time doing a lot of low-level performance work, then the sort of visceral difference between a nanosecond and a microsecond might not be too apparent to you, because they’re both such tiny, tiny units of measure.

So I’m going to borrow a good trick from Brendan Gregg here, which is by explaining all of these timing numbers by multiplying them by a billion. So addition, that takes about half a second. Subtraction, also about half a second. Multiplication, about two seconds. Division takes about half a minute. And these are all for a reasonable processor multiplied by a billion. Some other operations are system call. The very best case that you could have for asking the OS kernel what your user id is, a couple of minutes. Accessing something on an SSD is going to take about an hour. Starting a program in C, so this is running like Bin2, because it’s telling the operating system to start a new process which does nothing and then exits, takes about two days.

If Bin2 was actually a program written in OCaml rather than C, it takes about a week just to kick the garbage collector into moving. If for some reason your program’s written in Python and does nothing, then it’s about two months. And if you’re writing a video game and you have to update the world and render it at 60 frames a second, then you have six months of computation to get the next frame out. Of course, I mean, there are things that take even longer. If you were choosing to do a ping across the Atlantic, you expect that to come back in about two years time.

So the point I’m trying to make here is these numbers cross a lot more orders of magnitude than you might expect. We’re not talking about one operation being 10% faster than another. We’re talking about many, many zeroes and the difference between the two. So this is relevant when we start talking about memory. Yeah, some of them is actually thanks to [Byron 00:02:35] for help collecting some of these. Well, help collecting some of them, others just lifted wholesale from the talk he gave. Accessing something which is in the CPUs L1 cache takes a couple of seconds. These will vary a lot by the CPU architecture you’re using or which model processor and so on, but roughly we’re talking a couple of seconds for L1. The next level of cache a few seconds longer, maybe 20 seconds, half a minute, for the last level of cache. But if you don’t manage to hit the cache at all, then it’s going to be a couple of minutes, in again everything multiplied by a billion, before you can get things back from main memory.

If you are really unlucky, then as well as having to go all the way to main memory, you can get something called a TLB miss, which is the processor not only has to go all the way out to main memory, but also doesn’t actually have the table that tells us where in main memory it’s supposed to go, so it has to first do some accesses to find that and then do the accesses to actually access the data you wanted. So that can take, I mean, five or 10 minutes on a good day. And again, this is a lot longer than two seconds. We’re not talking about there being a 10% difference depending on whether you hit the cache or not. We’re talking about many, many times faster or slower.

Happily this is about as bad as things get these days. If I were giving this talk a few years ago, back in the days when swapping something out to a rotating disk might have been a reasonable thing that a computer could do sometimes. And that meant that when you access a variable in memory, if you were really unlucky, you would actually have to wait for this bit of spinning rust to rotate into the right spot, and that wouldn’t come back for months.

The point I’m trying to make here is each of these levels is 10s, 100s of times smaller than the previous ones, and if you can make your data very, very small, it can fit in the upper levels. And if it does fit in the upper levels, then access to it will be very, very much faster than if you have to go down to any of these lower levels to satisfy any memory requests. So things will be much faster if they’re smaller. So what decides how big your data is, in many ways, the compiler, the implementation of the programming language you’re using. It has the job of taking all of the things that you wrote and laying them out somehow in memory. The biggest question it has to answer is well, how many bytes and what order should this value that you’re representing take up?

There are a couple of other ones which are important. CPUs have multiple different types of registers, so, should this value, if we were passing it to a function, would it be better to pass it in the integer registers or the floating point registers? I mean, if it’s an integer it should go in one. If it’s a floating point it should go in the other. If you make the wrong decision you’ll waste a lot of time. There’s a penalty for moving stuff from one to the other.

And in the context of OCaml, which is a garbage collected language, we also have to worry about when we’re representing a value, do we have to tell the garbage collector about this? If we’re representing something that is an array or a tuple that’s on the garbage collector’s heap, the garbage collector must know about all references to that, so that if it needs to mark it or collect it or move it, it knows where to look. Whereas if it’s just some raw bits of floating point numbers, say, then the garbage collection need not be told about it.

So the best way of laying something out depends on the type, and most types have a pretty obvious, good layout. If we have an array of strings, that should be a pointer to some memory holding all of the arrays, so we just have to pass around this small pointer rather than copying all the array every time. If we have a float, and that should be represented as eight bytes for a double-precision floating point number, it should pass in float registers. No need to tell the garbage collector about it. If it’s a 32-bit int, that only needs four bytes. There’s a fairly obvious way of laying out each of these different types and it’s all fairly straightforward how you would do this.

Anyone who’s done any sort of performance work on OCaml, anyone who’s tried to make an OCaml program run quickly, will know that these obvious, good layouts are not at all how OCaml lays out int32s or floating point numbers. It does something much, much worse. Integers and floats all get their own separate little allocation of a block that’s 16 or 24 bytes long, which just contains this extra value and it only exists to contain this value, and the pointer to this block takes up just as much space as the value that you’re trying to represent.

So why on earth does it do this? This is not a system that was designed by people who didn’t know what they were doing, and still they did this. And they did this because of a really tricky problem, which is actually what this talk is mostly about, which is… Sorry, this is actually how OCaml represents an int32. The really tricky problem is how do we lay out an unknown type? If we know there was an int32 and there’s an obvious way of doing that, if we know it’s a float there’s an obvious way of doing that, but if we just don’t know what the type is what are we supposed to do to lay it out?

There are two fairly obvious solutions. Actually, sorry, first of all, this is actually a more common problem than you might imagine. You regularly find yourself with a program which contains some unknown types. The simplest example is you’re writing a polymorphic function like map. Map takes a list and the elements of the list are of any possible type, so when we’re type checking and compiling this function, we do not know what type the lists are because this function has to work for any possible type.

But writing polymorphic functions is a thing that you, there are many of them, but most, I mean lots of functions are just written first order. Even in cases where we’re not writing polymorphic functions, whenever we use a module that defines an abstract type we still have the same problem. If location.t, if there’s a module called location which defines a type called T, then when we’re compiling this we don’t know what that type is. This is still, we need to compile in the presence of an unknown type, and we need to know how these values are supposed to be laid out even though we don’t know what the type is.

There are two standard solutions to this. Number one, we can lay out all of the types in exactly the same way, which is the old Lisp approach, and then it doesn’t really matter how we’re going to lay types out because it doesn’t really matter that a type is unknown because they’re all going to be laid out the same way anyway. Or number two is we can delay making any choices until we actually know what the type is, which is the C++ version, that like when you write a template in C++ it doesn’t actually generate any code or compile anything. It waits until the template is used as a particular type.

Unfortunately both of these are horrible in different ways. The horribleness of the Lisp one is the inefficiency of the representations you end up with. You have a particular uniform representation. In OCaml that representation is, there’s a machine word, so 64-bits long, and if the low bit is zero, it’s appointed to something the garbage collector understands. If the low bit is one, it’s just an integer. And for things which fit into this, which are basically pointers to garbage collector stuff or integers, then this is reasonably efficient. But for anything that doesn’t fit into this, like for instance a 64-bit float, which might actually end in zero and will not be a pointer, then those have to be encoded via some more roundabout way. You end up having to, well, what OCaml does is encode them as a pointer to a block that was invented just so that it could hold this one thing. So when you demand this uniform representation, then anything that doesn’t fit perfectly into this uniform representation gets boxed. It gets a separate allocation. It gets represented as a pointer to that allocation, and this consumes vastly more space and is much less efficient than we’d like.

But there are also problems doing the C++ version. There’s a practical problem and a theoretical problem with doing that approach in OCaml. The practical problem is well it basically doesn’t work, that the amount of time C++ spends expanding templates and recompiling the same function over and over again results in compilation times and code sizes that would be quite a shock to the system, to OCaml programmers who are used to different things. So C++, since it has to expand templates every single time, would spend a lot more time and generate a lot more code. There’s also a slightly more subtle theoretical problem, which is even if you had engineered your way around this one, fundamentally you can’t quite do this in OCaml because it turns out in OCaml, a polymorphic function is compiled once for all possible uses of it and you can write really weird polymorphic functions. You can write things which pass around a polymorphic function at one time and then get used in lots of different ways and you can’t predict which ways it gets used or compile time.

You can also write a recursive polymorphic function which, when you use it as one type it recursively calls itself at an even bigger type, so the whole set of different types that it gets called at is not even necessarily finite. So if we did this whole program thing of expanding it out for every different type it’s called at, that’s not even necessarily going to be a finite collection of types.

So the next thing that it might seem like we should do is sort of mix the two, where we lay things out like C++ when we know what the type is, and lay things out like Lisp in the uniform sense when we don’t. This is an approach that has a long and storied history. There were a few solid attempts at this at OCaml in the 90s, and many of the languages beside. The basic trouble is it gets very, very fragile to determine whether or not the compiler knows what the type is. In particular, that gets broken when people start… It’s considered good subroutine practice to abstract a type. You have a type that’s being used. There’s no abstraction. It’s just said, “This is an int32.” And then someone says, “We could clean this up and move it into a module.” And now suddenly the compiler knows less and all the code is worse. So these changes, which you expect not to make everything worse, can actually affect whether or not the compiler knows the type.

The one that’s an even worse effect is the transitions between the points of the code for the compiler who knows the type and the points where it doesn’t know the type. You end up with a compiler that is automatically introducing a boxing to pack it up into the Lisp style when it’s entering a point where it doesn’t know it’s the type, and having to unpack it when it does know. So generally it’s very tempting to try and automatically choose between these two strategies, but the amount of time that you spend, the overheads that it introduces are firstly, wildly unpredictable and secondly, often much bigger than you save.

So we need to find something else to do. I am a type systems researcher, so my solution is add types. The great thing about this solution is that it is polymorphic in the problem. So in this case the issue is that we don’t know enough about the types themselves. We don’t know which ones have which layers, which ones are known, unknown, big or small, so we’re going to add types to the types. Okay, so the rest of this talk is going to be me explaining a load of code examples, so I would like to warn you right away that as this talk progresses, the connection between these code samples and reality will get more and more tenuous. This first line of code is something which you can write in released versions of OCaml at the moment. This second line of code where we’ve defined the type to be equal to this new type, #int64, this is something that works only on an internal branch of OCaml with support for this in the type checker. Soon we’ll get onto things that don’t even work in that branch yet, but for the moment this at least works.

Here we’ve got a new type, which is this #int64, and this is an unboxed int64. This is represented as 64 actual bits, no tagging. The garbage collector is never going to attempt to follow one of these. It’s passed around in integer registers and it does not register with the garbage collector. In fact it must not be registered with the garbage collector because the garbage collector would explode if it tried to follow it as a pointer. I promised types for the types. These two types have different layouts. Type S is a type of layout value, so at the moment all types in OCaml have layout value and we’re adding some which are of different layouts. And this second type is a type of layout bits64. That means, so things with layout value are represented in OCaml as the tagged pointers I’ve been talking about, either a GC pointer with a zero at the end or an integer with a one at the end. And things of layout bits64 are represented as just 64 opaque bits that get passed around.

Again, this is not the first time anyone has ever thought of using kinds for these types for types for describing unboxing. A good reference is actually some similar work that this is partially based on that was done for Haskell by Richard Eisenberg and Simon Peyton Jones. But the general idea is we should attach these layout annotations to types and use those to represent the layout of a particular value. One thing that this lets us do is we can write abstract types. Here’s an order id type. It’s an abstract type that might appear on a module interface, and I have not told you what order id is. I have not given you, there’s no equal sign, there’s no definition for the type. But it is specifying the layout, so we can build the layout but not know the type. So that means that this is still an abstract type. If this appears in the library interface, users of that library cannot freely create order id’s or multiply an order id by seven or whatever. They cannot treat it as though it were an integer. But they do know that it is 64 bits of opaque data.

This is relevant because you can then write other libraries which are parameterized by things of this layout. So for instance a hash table with 64-bit keys and a particular optimized 64-bit hash function says it is two type parameters, the key and the value, and the value is… I’ve omitted any particular layout there which just defaults to, it’s an ordinary OCaml value. But the key is specified as a bit-64. So that means that there’s an opaque type order id and there’s an opaque hash table type, and the hash table is optimized for just 64-bit values, and there’s enough information there that we know that we can actually make a hash table with order id’s as the key, even though order id is an opaque type that we can’t create or use outside of the library. And the hash table’s an opaque type, but we can still pass those.

So we can write a table from mapping order IDs to strings. If we tried to use the same hash table in mapping strings to strings, then we get a big compiler error saying that, “I was expecting a type with bit-64 layout and you gave me a type with value layout.” From the first two lines there you can see that this is actually real output from a prototype because the error message is terrible. If this was a rigged demo, then the error message would be a wonderful explanation of what had gone wrong.

So type parameters, this is the reference type from the standard library of OCaml to find us a record with one mutable field called Contents. Type parameters default to having layout value, which could also be written explicitly if you wanted, but you can explicitly specify what the layout is. Here is a reference type whose parameter must be an immediate type. Immediate is a term used in OCaml meaning those types which are always represented as the integer case. They’re always 63 bits of stuff with a one at the end and there are never GC pointers.

The reason you might care about things being immediate is that lots of operations are faster on immediates because you know you never have to get the garbage collector involved. You can read and write and modify these references and never have to keep the garbage collector informed about how the object graph is changing. That means that operations on this reference will run efficiently. You’re never invoking the garbage collector, even though they’re compiled without knowing what the concrete type is because we statically ensure that only an immediate type could be passed here.

And so this system of layouts for type variables is integrated into the type inference mechanism for OCaml. So for instance if we write a function here that creates a new one of these references, then we’ll find that we might expect it to have a type like this, say this is what the normal ref function would have, like takes an alpha and returns an alpha in ref, but this is not actually the type that you get. In fact if you try to write this you’ll get a type error saying that well, you haven’t… Alpha’s rendering other values and this type doesn’t even make sense for all values. The actual type of the make function is for all types alpha which are immediate types, then it is such a reference.

So as well as working with type inference, this also fits into the module system in OCaml. For instance, here are two different interfaces that a module could have. The top one says it’s an identity function, but it’s an identity function that works for all value types. And the second one says it’s an identity function, but it works for only the immediate types. And so you can write a functor which takes an A and acts like it’s a B, that if something works for all of the value types, then it will certainly work for just those that have immediates at the end. But if you try to do it the other way around, then you’ll get another error message saying, “No, you’ve claimed this would work for all possible types A, or all possible value types, but it doesn’t match because this one actually only works for the immediates.”

So the point I’m trying to make here is that this system of layouts for type variables is integrated fully into the OCaml type system, that it works with modules and type inference and these new type things that you may have come across, or polymorphic records, or polymorphic recursion, or all of the weird features of OCaml you can now specify, whenever there’s a type variable, it could have a layout as well.

One of the most interesting uses of unboxed types is to define more efficient record types, and about here is the point where we start diverging from reality a bit. I think this might be the last example here which will actually type check in the prototype so far, but this is the next step on the stack. We could define a point type which has a pair of X and Y coordinates, both 64-bit integers, but of course this will have a terrible representation by default on OCaml if this is represented by default as two pointers, each two separate allocations, each containing 64 bits of data and some junk. So what instead we can use is the #int64 type, so this means that this record with the #int64 is actually, the X and Y are not boxed in 64s of layout value, but real 64-bit values of layout bits64.

So the original point type, we could construct them by specifying what the X and Y fields are and we can project out the X and the Y components of them. Then we also get this unboxed record, this #point type which you can construct with the #-curly brace and project out as normal. So that means that whenever you see a type definition like the type point up above, this actually defines two different types. There is the ordinary point type, which is boxed, and it’s of layout value, so that means we have a type point, which is a value. It’s represented as a GC pointer. But we also have this new type #point, which is the contents of that. It’s represented as two bits64s. That means that in this example we’ve got this function, which is taking a point as an argument, that takes one argument at one time. It expects a register containing a pointer to values, whereas this one, this takes two arguments. This expects two registers to come in, one containing the X and one containing the Y.

So now that we’ve got these two different types being defined for each record, one of the most fun things we can do with them is inline them into other records. And so given the point type, we can then define a line as a line being given by two points. And here we’re choosing the two point fields to be both #point. This means that if we’d said that X was a type point, then it would have been represented as a pointer to this record with two fields, but instead we’re saying it’s a type of #point, so it’s represented as bits64 times bits64. So the unboxed #line is actually just four 64-bit values in a row. So if we want to pass around a line value, pass that to a function, then that will be passed as a single pointer, which contains points to a block of RAM with four 64-bit values in a row, and if we want to pass around the #line, then that will be passed as four separate arguments for the four components of that inline record.

Another useful application of this stuff is something that many beginners to OCaml are really surprised by. It’s the fact that constructing an option in OCaml actually has to do some work. This is quite surprising if you’re used to say Java, a language with nulls. It’s surprising that this can actually be like a noticeable expense. The option type in OCaml, which is similar to the Maybe type in Haskell, has two constructors, a value of type foo option is either some and the foo, or it’s none. So it’s used to describe values that may or may not be present. So what you’d like is for the none to be just represented as a null pointer or something, and then some to be a no up that just passes the value straight through, but what actually happens in OCaml is when you write Some x that allocates a little box just to hold X in.

And there is actually a good reason why it has to do this, and the trouble is that in OCaml all of these are valid and different values. You can have None, and Some None, and Some of Some of None, and so on. So these aren’t used very often, but they do actually come up in real programs. You can have a hash table of maps, say strings to an int option, so the keys are strings and the values are either an int or none. And then if you look up something in the hash table, the result of the lookup function will be an int-option-option, where none means the thing was not found in the hash table, and some none means the thing was found in the hash table, and what we found there was none. So the option type in OCaml has to do work in the Some-case. It has to do work because it needs to distinguish none from some, none from some of some of none, and so on.

This is something that we do need in general and this is something that’s fundamental to the option type in OCaml, but it is still very disappointing when you’re trying to write a record with an optional string field. But we know that a string is not going to be an option. We know that when we’re just talking about a string option this option, option, option, option is just not going to come up, but still in that case you have to pay for the cost of defending against this possibility. However, once you’ve got layouts statically showing up in the type system, then you have a means of statically preventing this thing from ever happening, rather than having to conservatively deal with the bad case.

So we can define, and again because Some None is a thing that people actually use, we can’t just magically change the option type. This will have to be a new different type. But we can define nullable types. A nullable type is either None or Some, much like an option. But the interesting thing is that the type parameters to the nullable type is something which can of itself be nullable. It is a layout non-nullable, which is another, like immediate, another subtype of value. It describes basically those value types which are not foo-nullable. And so that means that a type T string nullable is a thing that we can write. We can have a record field which is a string nullable. It’s either null or a string. But if we were to iterate this and get a string-nullable-nullable, then this would just be a static type error, so we don’t need to do any work at one time to worry about this possibility because we’ve just statically prevented that thing from even being legal to write.

So inline record and unboxed options are things that we’d like to have fairly immediate. These are things that have clear value. There’s a few more speculative things we’d like to play with at some point. Everything I’ve shown you so far has been fixed concrete layouts that every type variable has exactly one particular layout, which is important because the layout determines how it’s compiled. One definition corresponds to one block of compiled code, so we’re not generating multiple versions of each code for each different thing. But sometimes it can be nice to do that, so it would be nice eventually if we had some sort of, essentially, macros. If you would like to define something we just parameterize by a layout and it generates different versions of the code for each of the layout. But again, at this point we’ve gone so far away from the prototypes that are implemented, this is essentially just OCaml fan fiction, rather than anything that’s reasonably going to happen in the next while.

So I think I’ll just stop there and take questions, rather than going any further than that line.

Audience: What happens when you run out of registers?

In what context?

Audience: When you’re doing one of these layouts that’s fully fixed and you want to call a function and it’s too big.

Oh, sure. For the backend of the compiler, they’re just going to show up as though you had made a function, took lots and lots of arguments. So that already has code for this case. If you were passing more arguments than there are registers on your machine, then the extra ones get passed on the stack.

Audience: And then when you run out of stack?

I have a feeling that if you try to write a function that takes eight million arguments, something else will break first.

Audience: Is it a natural adaption of Levity Polymorphism [inaudible 00:29:20] OCaml? Is there any problem [inaudible 00:29:20] to OCaml?

So the biggest difference is that there’s a lot more subtyping in this version. The Levity Polymorphism paper tends to remove all, like replaces all uses of subtyping with polymorphism because there’s very little actual use of subtyping in Haskell, or subkinding I mean. In this context, because of the immediate types, which is a subkind of value and the non-nullable, which is also a subkind of value, then there is some actual subkinding going on. There’s some actual non-trivial subkinding relations. So, in fact, on that particular design decision we’ve mostly gone the other direction, either using just subkinding and avoiding any sort of polymorphism over layouts. So, yeah, it was started from a similar point. We’ve gone a different direction for these reasons.

Audience: So if I understand correctly, right now if you write a function you know exactly the layout, and you said like in the future maybe we’ll have macros to support other features. But then I was a little bit confused. When you showed the module definition, you said that the ID is a subtype of the other ID, so maybe that’s just me not understanding how this works.

Oh, yeah sure, so I’ll go back to that example. Okay, so here to-

Audience: It just gave the impression was that one of these is actually an implementation of [inaudible 00:30:40], but maybe I’m wrong.

Right. So in this particular example, because immediate is a subkind of value, it’s actually possible to write… You can write a single ID function, like a single piece of generated machine code, which works for all of the value types, like the same bytes of code work for all the value types, and then you can say this block of code also works for all the immediate types, which is-

Audience: But they have different layouts, so how would that work?

Oh, so all immediate types are in fact value types. Immediate is a subset of value.

Audience: Oh, okay.

But, yeah, so that wouldn’t work for the reasons that you were saying, the lack of macros. If this example were instead like value and bits64, which are genuinely incompatible layouts. Yeah?

Audience: So one of the points you made earlier on is that this kind of trade-off between Lisp and C++, which was like a terrifying pair of things to be caught between, that part of the issue there is that abstraction can break your knowledge about what’s going on. But isn’t that sort of like an own goal? Like you could choose not to do that. It can be abstracted from the point of view of the semantics of the language, but the implementation is perhaps free to know about it. So that’s like until… So what my question is, why is there this default to have the interface really defined and how things are compiled also, we’re not going to peek underneath for publishing purposes? And then I’m also curious how you feel about things like MLton which try and solve this problem by just giving you like, you can do whole-program compilation so that you have at your disposal a slow compilation mode but it gives you much better optimization and lets the compiler know more.

Right. So first the abstraction question. I’m kind of averse to systems where the implementation sneaks in and learns more stuff about the thing that doesn’t appear in the source code, because when people are writing high performance code they really do care about whether or not this is going to be boxed, or whether or not something is going to be applied, and I generally like things that matter to the programmers to show up in the source code at some point. So I’d like all of the aspects of the type which are relevant to performance, like how many bytes it takes and so on, those I would like to actually show up somewhere in the source code and a layer of annotation, and the mli file seems like the right place.

Regarding MLton then, there’s a whole lot of different options. There’s a big design space available to you if you decide to go for a whole-program compilation. So if you’re willing to take everything at the same time and work out kind of a whole-program reachability of which possible instances of a polymorphic function can be used and compiling only those. It tends not to be a thing that you can turn on and off for more performance, because the actual representations change. You could have an entirely separate implementation of two different implementations of the language, one which boxed and which didn’t box, but if you ever expect to link code together compiled in the fast and slow mode, that’s not going to work. So, yeah, this is mostly informed by wanting to keep OCaml’s separate compilation and fast incremental builds and so on. If you don’t have that constraint, then well I mean, then MLton already exists. Use that.

Audience: What do you do if you’re targeting different back ends where it is not possible to treat something as an immediate? So if you’ve defined something as 64-bits but you’ve got a 32-bit board, so you’re targeting a different backend? How can you use this [inaudible 00:34:18]?

So is this you’re using a type which is a different size or because the backend-

Audience: I have a type that is a #int64 and now I’m targeting a 32-bit machine, what happens?

Oh, so happily this is a thing that the OCaml backend can actually already deal with. The thing that’s interesting here is we’re not actually adding very much capabilities to the backend. It’s just OCaml can already do most of these things. There’s just no way of telling it to from the source language. So in those cases what it actually does is it splits the thing into two 32-bit registers and passes around the two halves and generates whatever CLLI code is necessary to do the addition, I mean to do whatever operations you define on the [crosstalk 00:34:55].

Audience: It’s probably like you’re doing usual compilations and then you find the time to do the [inaudible 00:35:00] depending on the backend you’re targeting, it should figure out a reasonable implementation for the layout.


Audience: Yes, I have a pretty basic question. You canvased it earlier. But if you call a polymorphic function, is there one implementation of it that will exist even with these immediates, or will you generate a new implementation for like 32-bit parameters and 64-bit parameters?

No, no, there’s a single implementation.

Audience: How does it know what to do, instead of what you reference in the box, the pointer, when you actually apply it?

Oh, if you apply a polymorphic function to a box in 32, then the whole box gets passed around, so it never has to inspect [crosstalk 00:35:37].

Audience: [crosstalk 00:35:36] clicking unboxed or one of these…?

Oh, so yeah polymorphic functions are polymorphic over a particular layout, so it will be polymorphic over all the value ones or over the bits64 ones, but not both.

Audience: So you’d have many implementations of map for the different sides to the arguments.


Audience: And that would be like [inaudible 00:35:51]

Yeah, yeah. And if we want to do better than that, then that’s the deep macros type stuff that I would like to get to eventually where we’re generating different ones for each different layout.

Audience: Okay.


Audience: What about [inaudible 00:36:03] unboxed sums where you have to do the non-nullable sort of thing and see that every element is not an unboxed sum?

Yeah, so very much want to do unboxed sums but haven’t finalized the design for them yet. Yeah, for things which are not the easy case of options then you’ll need to have some way of doing the tagging. There’s some slightly tricky questions about, what if some branches of the sum have unboxed raw data and some branches of the sum have GC pointers. It would be a bit much at the moment to try and teach the garbage collector that it’s supposed to look at this word to decide whether or not the next one has a pointer. And so I don’t think there will probably be some restrictions on exactly which types you’re allowed to have in the sum, like about some restrictions on mixing boxed and unboxed and different veins of the branches, but we want to do that but haven’t fully explored it yet.

Audience: That actually helps a little bit. I was wondering about what the garbage collector is aware of. Is it conservative? Will it ever interpret bits64 as a pointer accidentally, stuff like that?

So it’s not a conservative collector. It’s given precise information. So, yeah, a correctness condition is the garbage collector must never be told about a bits64 pointer because if it looks at a bits64 value, if it ever looks at one, then it will attempt to follow it as a pointer and will crash or do something horrible. That’s one of the reasons why value and bits64 and immediate are different layouts. They all take 64 bits of space, but values are ones that the garbage collector must be told about, bits64 it must not be told about, and immediates, the garbage collector may, it can be told about them. It doesn’t really matter. It would ignore them if it found them.

Audience: Would it be possible to convert back and forth from the boxed and unboxed versions of a type?

Yes, so I didn’t show it here, but that’s mostly because I haven’t actually picked a syntax for it yet. But, yeah, we need some means of converting between say a point and a #point. The main problem here is how many different places can we put a hash in the OCaml grammar before the parser generator dies? So one thing is that while we haven’t actually picked the syntax, we do want to emphasize that there will be some syntax there. That’s not a thing that’s going to happen automatically, that point and #point are different types that will be in there or if you try to use one as the other. But, yeah, you can turn a #point into a point by making a box and you can unpack it to turn it the other way around.

Audience: That has to be syntax because it can’t be a function because it would be kind polymorphic?

Yes. Yeah, it could be. I mean, yeah, there would need to be layout polymorphism, kind polymorphism in order to have that as a primitive, which I mean puts it in the same… That puts it in the same category as things like record projection. There’s a separate record projection operator for every record type that you define. Yeah?

Audience: This might be wildly premature but is there any like good napkin-math on the effect on compiler performance of doing these kinds of like kind checks?

No, it’s not… I mean I’ve been working on it for a while. If it were dramatically slower I would have noticed at this point, but there’s nothing. It’s not suddenly four times slower. The amount of work that has to be done in the compiler is surprisingly little, but yeah, it has neither been benchmarked nor optimized to any serious degree.

Audience: Yeah, wildly ahead, but are you going to, let’s say, combine this with a resource polymorphism paper stuff and rewrite the garbage collector itself in OCaml?

Yeah, I expect to do that in the next week or two.

Audience: It is multiplying times by billions.

Yeah, maybe dividing in this case. Yeah?

Audience: So are there plans to bring that nullable example into the OCaml standard library? It seems quite useful.

Sorry? Oh, the nullable example.

Audience: Yeah.

Yes, I would quite like to, but again, this is a thing that we would like to implement and a feature that we would like to implement in the prototype that is only half implemented, so it will be a while before that shows up in the standard library. Yeah?

Audience: Is it possible to mix value fields in a record with unboxed fields?

Right, that’s a good question actually. Again, so at the moment we’re trying to write a type system which reflects what the backend of OCaml can actually do, and OCaml at the moment does have this restriction that an allocated block must entirely consist of a sequence of values that the garbage collector can parse or contain no-… There would just be like a string, just contain no values that the garbage collector can parse.

Audience: So you’re not touching like GC internals.

Correct, yeah. I mean, eventually at some point it would be nice if we could relax that restriction a bit. It’s a bit difficult because it might induce some overhead elsewhere in the GC and it’s a somewhat separate project. If we ever get around to allowing mixed pointer, non-pointer types, or mixed pointer, non-pointer blocks on the heap in OCaml, then we can easily reflect those back in the type system here. But for the moment we’re just reflecting what the GC does at the moment.

Audience: So there’s a third layout that you didn’t talk about. Is that one going to go away? You didn’t raise the floats?

Which one was that?

Audience: The float array hack.

I tend to just feel like pretending that doesn’t exist.

Audience: Could we make it formally not exist? It only has an impact on every polymorphic function, right?

Every polymorphic function that uses arrays. I mean, okay, so we could introduce a new layout which is the layout of those types which are known not to be float, and then you could write arrays with that as the type parameter, which would be anything but float, and then even with the float array hack turned on, you would have efficient arrays. I would really rather not do that, because it seems like we’d only prolong the lifetime of this hack that I really want to die. So I’d prefer that we just don’t do that and just pretend it never existed.

Audience: Do you have any especially compelling benchmarks of code that is faster now that you have a prototype where you can do this nice memory layout?

No, because the prototype is very, very prototypy. For instance, the prototype at the moment is a prototype of type checking, so it’s doing all of this type inference layout annotation stuff and so on, and if you give it a fully annotated program it will go over it in detail and say, “Yes, this is an excellent unboxing scheme. This is exactly how I would lay this program out for maximum performance,” and then will ignore all of that and compile it just as it used to. So we’ll have to get over that hump before we can do any benchmarking.

Audience: But you did not show [Aries 00:43:15], like Aries have supported [inaudible 00:43:18]

Oh, so that’s true. Those aren’t here. There’s an old patch to the compiler from about three years ago by Leo White, which adds a mechanism to define new array types, so arrays of a particular type and so on, which as this wasn’t merged at the time, but becomes much more compelling with this because it’s exactly the syntax that you need to define an array of a particular unboxed type. I mean, it’s three-year-old code and it can’t be compiled. It’s moved on. So it’s not a thing that we can just merge immediately. But I would eventually like to get that in to do exactly this.

Audience: Thank you.

Audience: Maybe it’s just my lack of understanding but maybe you can explain a little bit why you don’t want to use kinds? Why don’t you want to use polymorphism kinds and you’re saying that it will be resolved with macros instead?


Audience: So you’re saying that the polymorphism, like there’s going to be no polymorphism that will be set up for the kinds and you instead will use macros. Why not use kinds, or am I just misunderstanding?

Oh, so the Levity Polymorphism work actually has the same restrictions that we have here, that things can only be compiled if they… Like it will not compile a kind polymorphic value. It will just allow kind polymorphic types and so on. So both the Haskell design and this design are going to more or less the same final set of restrictions. It’s just there’s a slight trade-off on whether you get there via expressing it via polymorphism or expressing it via subtyping.

Audience: So Haskell also generates multiple versions.

No, it has the same restriction in the types to prevent you having to generate multiple versions.

Audience: Oh. Okay.

But, yeah, I would eventually like to get to use polymorphisms for expressing multiple versions, but that’s a long way off. Any more questions?

Audience: Did you have any trade-offs that you had to make or other designs that you thought of? I thought it was a very [inaudible 00:45:19] presentation? It seems all set. I wonder if there was a process that was interesting or messy to get to this design?

Oh, very much so, yes, there is. I mean, even if with the part I presented to you there’s still quite a lot of different design choices done to, like quite fine-grain things about whether type variables should default to a kind value or whether you should infer what layout they actually have, and so there are a lot of both large and small design decisions, not all of which have actually been made, and very few which are actually settled, so, yes.

Audience: So you have some nested unboxed record, whether that be something simple like a rectangle containing two points or something bigger and even more nested, and I want to access some field deep down in it, so rect.bottomleft.x.


Audience: Would that need some kind of optimization to not copy the entire bottom left point and then access just one field of it, and like that would get even worse in a bigger case, or would you even do something fancy like a simple version of borrows in Rust?

Oh, I don’t think we need anything quite as complicated as that. I mean, so the simple options for getting that to compile right are either like detecting sequences, or when you’ve done A.B.C.D, or actually just loading them all and relying on the fact that the compiler can see that these are the loads that are not used at all and discards them at a later stage of compilation. It’s certainly something we have to pay attention to. It would be bad if we ended up copying huge amounts of stuff just to access a single field, but I don’t think that will require any particular attention.

Host: Okay, well, let’s thank our speaker.

The next great idea will come from you