How to Build an Exchange

Brian Nigito

Jane Street

Electronic exchanges play an important role in the world financial system, acting as focal points where actors from across the world meet to trade with each other.

But building an exchange is a difficult technical challenge, requiring high transaction rates, low, deterministic response times, fairness, and reliability.

This talk looks at the question of how to design an exchange through the lens of JX, a crossing engine we built at Jane Street in the last two years. Performance plays an interesting role in this design, in that, although the end-to-end latency of the system is not important in and of itself, the ability of individual components of JX to handle messages rates in the 500k/sec range with latencies in the single-digit microseconds helped us build a replicated system that is both simple and robust.

Transcript

What is an exchange? We’re going to talk about it from the perspective of the data structures, and the messages that get passed around to do what an exchange does. And so the first data structure we’re going to talk about is really just the limit order book. And this is a NASDAQ workstation style book, where it’s got the best prices to buy on this side, and the best prices to sell on this side. And you’ll see that buys are sorted in decreasing order, because when you sell something you’d like to get the highest price for it. And sales or offers are sorted in increasing order, because when you buy something, you’d like to buy it as cheaply as possible.

You’ll sometimes see what’s known as a ladder view, where the sales are on top, and then the buys and then the inside, or the touch is where they meet. I don’t use that because somebody has a patent on it, and they went around suing people based on that representation. So we try to avoid that. But you will see that representation. In this book, we’ve got some different market participants. And these are just fictional market participants, for those of you from Goldman and BAML.

And so this is essentially the representation, and we could talk about the data structures you might choose for keeping this, once you understood the heuristics. But I imagine that, that’s pretty clear. And so, what are the key messages? Somebody can place a new order, saying that they’d like to buy this particular instrument, in this case, Apple at 33.29 for 500 shares. And you can see that the book is sorted in… It’s going to be probably sorted in price and time. And so they are sorted right there. That’s one transaction. Somebody would like to place an order to buy or sell.

Now here’s another one. So in this case, this participant decides they want to sell at 33.29. And this one is special in the sense that, as we noticed before these prices, the bids were strictly lower than the offers. So in this case he’s selling, and he’s selling at a price that we would say is marketable. And so atomically this is going to create a trade, and they’re actually going to exchange shares of Apple right there. And so, the system is going to just deterministically apply its rules, and generate executions going back to the individual participants.

And of course, you’d like to be able to cancel if you had an order outstanding, and you’d no longer want to buy or sell at that price, you need to be able to pull it back. And that’s basically it. That’s probably about, I’d say add orders are probably half the messages in the market. Cancels are probably like 30 or 40%. There’re some cancel/replaces, which are a combination of those two. And executions are probably 1 or 2%. And then the rest is basically noise. Any questions on that? Hopefully this is pretty transparent to most people. Painfully bored yet? No?

And of course we have multiple instances of these instruments. But I would say it’s not even clear that we do. Because there are certain kinds of order types, and situations where you’d actually like to reach across, and touch two instruments atomically. And so it’s not true that in all cases it would be clear, that you would want to paralyze at the level of an instrument. So what are the requirements?

Generally in the U.S. equities market, you’ve got to have a lot of scale. And by scale I mean probably about 3 million messages per second, at peak rates. You have thousands of participants and connections, and you’re trying to coordinate their activity. You probably have several million live orders. And in U.S. equities may be around 10,000 symbols. And there are different markets with different requirements, but we’re going to stay somewhat inside of those heuristics.

You really want fairness. And so, you’ve got a bunch of people who are competing against each other, and you’re trying to have the information arrive to all of those participants at the same time, as much as possible. It’s gotten a lot of scrutiny in the past decade or so. And a lot of the fairness issues are small things that even crop up unintentionally. People just say, “I don’t know why, but this system on this connection seems to be performing better.” And so some people are hunting for advantages, I guess most participants are and some are just recognizing that in aggregate, some things are performing better than others.

I often tell the story that when I was at Iland, and we had TCP connections for market data, we had one system that started to multiplex and handle multiple connections and we just had a four-loop, where we would connect all the connections, and aggregate them up. And when we sent out messages, we’d loop over them and send them out. And so what do you guys think happened?

Everyone tried to get in first.

From the moment we shut the exchange down, to the moment we brought it up the next day, there were thousands of connections per second, just trying to be the first guy in that list. And so you’ll see that a lot of the architecture, we’re thinking about how we can make this information appear simultaneously in multiple places.

You really want reliability, I think that partially that’s obvious. And you need durability. If we tell two participants that they have traded, it’s going to set off a chain of events that’s going to cause them to commit capital in other places, do other trades. And if you forget or tell people, “No, just kidding.” Or, “I told you to do the trade and you didn’t,” you’ve really upset a lot of people, and created a lot of problems. And different systems can tolerate different amounts of it. Exchanges have limit of liability. Legal stock exchanges. What we run here, one of the businesses we run is known as a single dealer market, and it doesn’t have the same kind of protections. So we have to be extra careful.

And you really need to be robust to bad clients. You can’t really have somebody who’s badly behaved, affecting other participants. And badly behaved could be because they’re just optimizing for something different than you are, and you’ve put incentives in place in your order types, that created this desire for for them to do something that is essentially painful. Or they’re just careless. There’s lots of reasons. One of the examples I give going back to the previous slide, you may have noticed, when this firm cells at 33.29, what price does GSCO get? See there’s an overlap. And so there’s a decision that needs to be made here, as to who gets that price improvement, we say. And it’s traditionally done, such that the price improvement is given to the aggressive order. Encouraging people to send their most aggressive price, in case the book is changing at that time. There are various order types and systems that don’t do that but Incident many years ago did not do that, and they had hidden orders inside, that if you overlapped with them, they would get that game. And so what do you think happened?

Very quickly you started to see orders that would try every price point. Walking down, walking down, walking down, walking down, until they got an execution. So you want to think about those incentives when you design these systems.

We’re going to put together what has often been called a crazy design. And I know that when I first saw it, I looked at it like, “Really?” And it was particularly interesting when some senior NASDAQ technologist, this would have been around 2002, would come in to get a description of the system, because people were fairly friendly. And NASDAQ was running on big mainframes, and tandems. And we told them that actually we take the entire market, and we run it through a single prototypical x86 machine. That’s it, just one. It’s that one over there called Monster. Because it’s a big one. We bought the best one Dell made, but that’s it. And the gentlemen would walk around and be like, “You can tell me, where is it really? No, I know it’s here somewhere. Come on.” But no, that was it.

And so here’s our design. That’s it. We’re going to have single application, we’re going to call it Matching Engine. And we’re going to have some client ports, which accept connections to individual clients. And they’re going to perform transactions with the Matching Engine. Send order, cancel, et cetera. And the Matching Engine is just going to keep the data structure in memory right there. And that’s it. And he’s going to acknowledge the responses back to those ports.

There’s more though.

But we have other applications that are interesting. We have something we call Drop Ports in the industry. And so there may be somebody who clears or guarantees those trades, and they would like to know the activity of their participants. And so this Drop Port, he might need information from this port. He might even need some information from this port. Because he could be clearing for… A single port could have multiple entities submitting orders on it. You might have hundreds of ports, and he may need some arbitrary combination of all of them. And you’d also like that information to be equally timely. We need to tell the public. We need a trade reporter so that when transactions happen, we go to a trade reporting facility, and we tell them that these happened. And we want those to happen, again, on parity with all the rest of these. And we want that to be fair.

Most exchanges have some system of market data, where they’re telling you what’s happening on the system at the same time. And generally, it’s really just an anonymized, and slightly filtered version of the raw transactions I showed you before. It’s going to hide some of the names, certain kinds of hidden activity isn’t shown. But by and large, he’s just showing you the transaction log as it happens, so that you can build your own copy of the market and follow along.

Everybody always likes this name. But since the Matching Engine has got a lot to do, we try really hard to factor out as much logic as we can outside of that. And one of the examples that’s regularly given, is something we had called the Cancel Fairy. And so people would place an order and say, “I’d like this canceled in two minutes. And I like this one canceled in 20 seconds. And this one by the end of the day, and this one in four minutes.” And the Matching Engine was like, “Ah, noise.” You either want it canceled immediately, or sometime in the future. And if it was sometime in the future, he acknowledged it. And there was somebody who was observing those transactions, who would then send a cancel on his behalf two minutes later. And you would run three or four of those. One of them could crash, it wouldn’t really be a problem. They would race with each other. It was what it was.

And it was also nice, because the core business of placing and canceling orders, was kept extremely simple. And you can layer a lot of functionality off to the side. Other more complex matching scenarios, like auctions, typically also done outside of that. The auction process, the continuous limit order book, where the orders are coming in matching immediately and moving on, that’s pretty straightforward. In auctions you’re going to aggregate a whole lot of orders, often in overlapping prices. And then you’re going to run an optimization, to find the price that maximizes the shares traded. And that can take some time. And so, that optimization is run independent of the rest of the transactions that are going on. And the results are shot over to the Matching Engine to report back to clients.

So we’ve got a lot of systems here, that all need to get this information at the same time. Any ideas on how we’re going to do that? Thoughts? Any network programmers in the crowd?

Multicast?

Multicast. How many people know what multicast is? All right, most people. So for those of you that don’t, TCP/IP networking has built into a lot of the hardware, the network cards, and the switches more importantly, the ability to have logical addresses. That when you send a packet to this address, it’s electronically repeated to every interested party. Relatively simultaneously. It’s really powerful. And without it, this kind of architecture really would not work. But it means that when the Matching Engine accepts an order, or produces an execution, both parties to that execution see it at the same time. The market data sees that at the same time. The Drop Port sees at the same time. The trade reporter sees at the same time, et cetera. There’s only one downside. This is UDP, which means it’s unreliable.

There are no sequence numbers. There are no acks, because it’s not really connection oriented. You have to be prepared for the fact that these messages can get dropped. And so, we add to that a series of retransmitters. And these are just servers whose sole job is to record the messages that have been seen. And if they missed the message identified by some sequence number on the message, they ask each other. And if neither of them has it, they can ask the Matching Engine, who does keep an in memory copy of the messages that he’s generated. And hence can resend to them. And so you’ve insulated the Matching Engine a bit, and yet provided some reliability.

So any questions so far? We’ve thrown a lot of boxes on there. Yep?

So I gather the port boxes are not actually TCP/IP ports, in a traditional sense. That s a separate machine, meaning essentially some load balancing to your clients.

It’s generally a process that presents to each client, a TCP connection. There’s a variety of industry standard protocols where you need to provide… Like FIX, or others, slightly more efficient ones. Where you need to provide your clients, an in order, sequenced list of messages related to their trading. That’s typically TCP. The transaction internally here is entirely UDP.

But the Port is not a port on the Matching Engine.

Nope. It’s typically an entirely separate application on the Matching Engine. I guess you could think of each box as a process. Think of it that way. Typically, we would also keep these lines. Just practically speaking, a lot of these things are multihomed. So there’s a private network, which has a lot of this multicast traffic, which involves every participant’s trading. And then there’s a public access network, which involves some access control, and some firewalls, and things like that. And so the machine’s running ports are typically spanning those two networks. Yes.

Does the prime connect to the port via TCP?

Typically, yes. There’ve been other options. But generally this port allows us the opportunity to insulate ourselves from bad clients, and to normalize a lot of the information, and offload some work. So one example is, the Matching Engine typically has very efficient data structures, where he pre allocates all the representations for orders, and he saves them in memory. And so when he acknowledges an order, he essentially gives a pointer back to the port. Saying, “Well, the order that I acknowledged is right here.” He’s not going to return that memory address to the client, but he holds onto it.

And when the cancel comes from the client… And he does a variety of work, checking to make sure that a number of things are valid. This isn’t a token that’s been reused in the day. He then looks it up, he finds it. And then he sends that back to the Matching Engine with this locate number. It’s like a cheat code to find it really quick. So the ports are doing quite a lot of work for us. And they’re also controlling, or they’re providing some level of flow control. Which we’re probably going to talk a little bit about later. And that’s one of the problems that comes up with UDP, is that you don’t have any throttling. UDP is not going to essentially be concerned with the health of the network generally.

But now that we have this re transmitter, we can build these applications to be able to replay all the messages from the beginning of the day. And so, if we have the discipline to write each of these applications, using the kinds of techniques that I guess in distributed systems are called state machine replication. Where each of these applications is written just to maintain a certain series of state, and just deterministically apply these transactions, which have been ordered by the Matching Engine, bit by bit, to build up their state.

Then we can shoot any of these in the head, and rebuild exactly what they had. So if a client port has a hardware problem, you can bring up another one, on say another machine, or the same machine, move the IP to that other machine. And when that process comes up, replays all the activity from a re transmitter. He rebuilds all of his currently open orders. He rebuilds what the sequence messages were, that he was doing with that client. If some happened while he was down, getting switched over, he’s going to also add those. And so when the client reconnects, they’re going to exchange, and have a handshake where they’re going to agree on, “Well, I saw up to sequence number N.”He says, “Well, actually I have N plus two.” And he’s going to send the next two messages. And then everybody is synchronized.

Okay, everybody follow that? There is a minor problem with this. We’re missing, certainly at least one thing. What’s the system that I can’t shoot in the head? Matching Engine. So we’ve got to fix that. Presumably before we start. We did fix that, right? Hold on. So anybody have any ideas on how we can fix that?

We run a secondary Matching Engine. But what does he do? What does he listen to? He’s going to come up, and he’s going to listen to, not the ports. Because the order in which the UDP packets are arriving at the passive Matching Engine, they’re totally no guarantee there. So what he actually listens to, is the Matching Engines output. He’s able to, and we’ll see some messages later. He’s able to essentially identify only those messages that the clients submitted. And he’s going to essentially run the identical state machine, and actually the identical code that the primary Matching Engine is running. And hence, if the Matching Engine were to die in the midst of a transaction, like the earlier one we saw, where the order comes in, and it generates two executions. Let’s say he generates one, and then crashes. The passive magic engine doesn’t care. Because as long as he saw the order that inspired it, he’s going to just mechanically generate those identical messages right alongside.

And then we do have a careful process, by which if we lost this Matching Engine, an operator switches over to this one. And there are systems like Paxos, which essentially have algorithms to vote, and decide on a primary talker who’s going to be doing the sequencing. Those algorithms are pretty complicated, and they involve an extra round trip and a hop. And so I don’t know of any exchanges that actually use anything like that. Most of them essentially have fail over mechanisms that are pretty close to this. And I’m curious if anybody is aware of any, or if anybody actually worked on any Paxos implementations. No? Okay.

Question. When does it code how correlated they are?

How what?

Correlated. If a sequence event crashed the first one, what’s the probability of the second one crashing?

Yeah. It’s a really good question. Well, from a code perspective yes. Certainly you might have some differences in hardware. OS issues, things like that. But yeah, we have to be really careful about this, the state machine. And making sure that what’s on here is extremely clean. Extremely clean. And we do a lot of testing. And functional languages are good. But yes, Claudiu points out a very good point. That it’s a little subtle, it’s a little brittle.

It’s a double edged sword.

It is. But that is actually the way it works. That is actually what we do. And you can certainly bring up a Matching Engine in the midst of the day, towards the end of the day, and rebuild that state.

So why do we need to be fast? We need to be fast for the obvious reasons. The millions of transactions that we’re doing. But we also need the applications, those little boxes, to be really efficient, because everyone sees every message. So you really have to have a strategy for discarding those things that you’re not interested in. And you’ve got to be able to do it quickly.

Latency determines throughput. We have a simple flow control mechanism. Those ports can only do a single transaction at a time, one unacknowledged transaction in flight. So that TCP port that the connections are made to clients, they can keep sending orders. But it has a very small socket buffer, and if they fill it up, those rights will block, and it will essentially push back through the normal TCP window system. So the throughput of any one port, is really a function of the latency in the system. And so we’re using a lot of techniques, relatively commodity hardware, to keep those numbers down in the low single digit microseconds. So the internal transactions are extremely quick.

Replay. There are ways to get around the recovery taking a long time. You can essentially have servers that transfer the current state to a machine, without having to replay the whole state. But the more diverse set of applications you have, this becomes more of a pain. So if you can essentially rebuild the state of any application inside of 30 seconds, or a minute, you really don’t need to do any more work than that. And so the speed allows us to keep the system really simple from that perspective. And this is actually doable with, like I said, relatively commodity hardware. We do buy nice NIC cards, but relatively commodity hardware. And we do it with safe languages. I’ve done it in everything from C++, Java, and now in OCaml. And OCaml has really been a great language to do it.

I have probably a whole other talk, where I can talk about some of the things that we did to get OCaml to raw memory, C level performance. And the runtime is actually quite simple, and that’s a real benefit. I think it was actually harder to do reliably with Java. Because a more complex garbage collector, a more complex multithreaded runtime, makes it really, really hard to get this level of determinism.

And so we’re going to talk about that more probably in future talks. But we don’t do it with the standard idioms. And when I did it in Java, or in OCaml, you can’t use tons of deferreds, and allocate things all over the place. Basically the applications involve: data arrives, ideally without any copies involved whatsoever. It arrives in memory. You read some values out. You change some data structures on the side. Usually with pooled values that you reuse. You go back and you do it again. And you aim to allocate very little in the critical path, and you aim to be prepared upfront, with all the memory you’re really going to need. So there is more parallelism that’s possible with this system. What’s an obvious form of parallelism?

Yeah. Symbols. Symbols jumps up as one. But again, that has to be done carefully. When you paralyze at the level of assembly, you lose some things that we really enjoy. So one of the things we do at James Street, we do a lot of ETF market making. You have a lot of ETFs that have very common components, and overlaps, and factors. And so we like to be able to set risk limits across the whole market, that we can enforce atomically in the Matching Engine. And so if we really do paralyze, and we’re matching independently, we lose a lot of that ability. And so, while you can certainly do it in a variety of ways, it’s not always ideal.

There’s certainly a decent amount of parallelism you can do, certainly within a machine to have multiple cores. Although strangely enough, and this is going to get a little into the weeds, I think that the vast majority of the game, comes in having multiple memory fetches going out at the same time. So when I’ve profiled some of these Matching Engines that are running really fast, they actually spend about 30% of their time on a single line of code. And that is, remember I told you that pointer? It’s actually not really a pointer, it’s typically just an array index. De-referencing that and finding the order, generally is a cache miss that is one of the slowest parts of the machine. So when you actually operate on multiple cores at once, you essentially have memory pre fetches going at the same time.

So you actually could read, say the next batch of messages, issue pre fetches for those, and probably do similar though not quite as well. Now, if you paralyze the whole system at the level of a symbol, you really need to give people different ports. Because if we’re going to give you a single sequenced stream of messages, going back to this architecture… If I stack up Matching Engines, but my client still has a single port, well now I’ve got a problem. I’m injecting state that’s not recoverable right here, when I decide that order that I’m going to present it to the client. Everybody follow that? There was some exchanges that did that. They had multiple Matching Engines. They had one essentially sequence stream. And I think that the cost and the complexity caused them to say, “You know what, we’re actually going to give you one connection, but we’re going to give you multiple sequence number spaces. And we make no guarantees that we’re going to give you them back in the same order. We’re just going to essentially proxy connections for you. And that’s it.”

And if you’re going to replicate it at the level of symbols, you’re basically replicating this whole thing. And you’ve just got multiple instances of this architecture. For the options market NASDAQ did that. They had multiple instances, and then they had another one in front of it, to hide that that was happening. And people were concerned about the latency. And over time they eventually just went directly to what they called the rings themselves.

I want to talk a little bit about, in a little more detail, what I call the locking. Which is essentially how we deal with transactions that are trying to change the same state. When I worked on this system, back in the Island days, we actually were very specialized for the transactions in trading.

We were looking at each transaction, and we were trying to say, “Oh okay, I asked for this to happen. I got this message back from the Matching Engine. Does that satisfy what I was doing? Well, I sent an order and this was a cancel.” And there was a lot of business logic in that. So over time I started to try to generalize it, so that I didn’t care necessarily what the transaction was. And so what we have now, is every message has a unique topic. And you can just think of each topic as some independent lock. And we have a sequence number per topic. Again, this is different than the total sequence number that we’ve assigned on to all the individual messages.

So each contributor, and a contributor as a port, essentially tries to propose the next transaction, and assign the next sequence number for that topic. And if someone else gets it, he essentially has to give up, accept that their transaction happened, change his state machine, and then try again. This is another reason for the one in flight semantics. Because this rollback process is a lot easier. If you remember the message that caused you to want to propose a transaction, and it turns out you have to roll it back to some extent, you really only have to keep that one message. So this is my attempt of doing this in a timing diagram. Anybody remember the Richard Stevens books on TCP/IP? It’s like a Stephen style time curve.

So this is a little unfortunately named, but P1 is the first port. And so this P1 here, is essentially the publisher. The guy sending this transaction. This second P1 is the topic. Then we have the sequence number. And then we have the actual message, what is he trying to do? So port number one tries to buy 43.50, a hundred shares. The Matching Engine sees it, and he says, “Okay. Is three the next sequence number for port number one?” “It is.” “Great. Okay.” He basically just turns it right back around, multicast it out, everybody sees that a new order has been acknowledged. Second port does the same. P2, that’s the publisher. P2 is the topic. Eight is the sequence number. In this case, he’s selling at 43.49. And note, these two overlap. This is a marketable order, so we have to generate a transaction. Who generates the transaction? It’s going to be the Matching Engine. So can anybody guess what the next message is going to look like? Probably have a 50/50 shot. No? All right. Well then I’ll show you.

So the Matching Engine is the publisher in this case. And he’s publishing on P2’s topic. And P1’s topic. So eight was the sequence number that this guy asks for. He got that. And then he also got a nine, which said that he successfully sold those hundred shares. And topic number P1, he also got a message indicating that he bought those shares. So what’s an instance of a collision that we could have here? P1 actually at this time, not knowing that this transaction happened, tries to generate a cancel. He tries to pull that order back from the market. Well as far as he’s concerned, that sequence number four. He sends it to the Matching Engine, the Matching Engine looks at it and says, “Is four the next sequence number for P1? No, it’s not. Four has already happened. So what does the Matching Engine do?

Nothing. He doesn’t reject it, he doesn’t knack it, he just drops it on the ground. Because he knows that all these messages are reliable, and this guy is going to eventually see that four happened. When he does, he says, “Oh great. I was waiting for four, but unfortunately it wasn’t my four.” He knows that because the publisher ID doesn’t match, it was not the one that he sent. So he then has to take this execution, apply it which removes that order.

Then he says, “Okay, let me go look again at what I was trying to do before all this happened.” He says, “I was trying to cancel this order. Let me try again.” Cancel one order, the order’s gone. So now he has to generate a cancel reject. Again, that’s going to be now for sequence number five, because he’s already applied four. And he’s going to bounce it off the Matching Engine, and all is well and good.

Now this is a little special. For certain kinds of transactions, you could get into something like a live walk scenario, where people were continually fighting on the spin. But for these kinds of transactions, most of these terminate pretty quickly. Because it’s usually a case like this, where this order is now gone. And so you don’t get a lot of transactions that can actually persist. But there is some judgment in how you apply this kind of architecture for sure.

Why do you have to send a cancel reject to the Matching Engine? Why does the Matching Engine have to know that?

It’s a good question. You would think it was a planted question. Because people always ask that, and part of the slide is like, “Why do I have to do that?” It’s actually because of the protocol. In the protocol, cancel rejects are sequenced. And if it’s sequenced, I can’t essentially add that to my state, because if I crashed I can’t replicate it. So anything that you’re going to take action on, that you’re going to show to a client, that you’re going to change your state machine with, has to essentially come from that global totally ordered stream of messages. If the protocol said, “Cancel rejects are just a thing that I send, and I don’t remember them, and move on,” then you don’t have to do that. But thanks for asking. Good bow tie. Everybody see that bow tie? It’s nice.

How you prevent the against a Matching Engine dual write, against two topics?

How do you protect against the Matching Engine doing a dual write against two topics? I’m not sure I follow the question. Walk me through.

The Matching Engine is sending two messages, two ports. And if one of those fails, that doesn’t seem atomic. Or maybe I misunderstood.

Well, also a good question. The only thing that has to happen is this cell. If he dies writing this message, or writing that message, we don’t really care. Because as long as we have a passive Matching Engine, as long as we can bring up that logic, and we get as far as this message, he’s going to mechanically generate those next couple of messages. Because he’s just going to apply the state machine. He’s going to deterministically apply those rules. He’s going to generate those messages. The passive Matching Engine actually ignores those messages from the publisher ID, any. Because, he should be able to generate them. And so there is some carefulness that these messages are atomic, and that’s at the level of a UDP packet in the sequence numbers. So you don’t really get a partial message per se, that would be a problem. But as long as that message happened, then this is going to flow on. Does that answer your question?

Yeah.

You agree?

Yes.

Good.

So what are the takeaways? Speed. People talking about speed in the financial markets. It’s not just for speed’s sake. Speed and determinism do translate to better prices, but they do allow us to do things with the architecture, and keep things simple in a way that we couldn’t, if we couldn’t hit some of these performance numbers. State machine replication, it’s a really great technique. It makes things extremely testable, repeatable. It’s a really beautiful way to program. When we roll out new Matching Engines, we replay all. And when I say we, I mean Pete. We replay weeks of past data through that state machine, in addition to doing fuzz testing and things like that, where we make up messages and put them through. And state machine replication lets us do that, in a really easy way. Now you need a certain amount of discipline. You can’t just act on the timer on that box. You have to come up with ways, where anything where you would like to change the state has to essentially be represented. But once you fall into that, it’s a very nice programming model.

The determinism is great. Again like I said, if a regulator wants to know, “Why did you do this transaction at this time?” There’s really no uncertainty. It’s not, “Well, I have multiple threads, and I might’ve gotten this quote at this time.” Nope. “I know exactly what I knew at every point in time when I made a decision.” And that is super, super helpful.

As I talked about before, more parallelism is possible. And I’m happy to go into some of the details about where I think that’s doable, and what some people are doing about it. Now, I know that the NASDAQ matching is I believe is running eight threads, and they have an ingress thread. They fan out and they have symbols allocated to the different cores. And then I think they’re bringing it back together for publication.

There’s lots of different ways to do it, it’s a really interesting problem. At least I certainly think so. And even though it’s a crazy looking design, as Ron has said, a lot of these design decisions hang together. Each one of them by themselves, I could argue. But when you step back and you look at the whole picture, it really creates a system that solves a lot of the problems that exchanges have, in a nice clean way, and scales fairly naturally. So if you found this interesting, we’re glad you came. We hope you enjoy the food. We’d like you to take a T-shirt. But also… Sorry, I just think that’s the hardest sell I’m going to give. But that’s it. I’m happy to take any questions.

The next great idea will come from you

View open roles