Performance Engineering

Performance engineering at Jane Street

We’re constantly chasing better performance at multiple scales, whether we’re shaving nanoseconds off the critical path in a trading system or optimizing the training loop for an ML model.

See below for examples of some performance problems we face and the tools we’ve invented to solve them.

Keeping up with the market

As markets grow, our trading infrastructure needs to process ever growing amounts of data in ever shorter time windows. That’s why we build highly-optimized packet processing systems that are capable of handling millions of multicast messages per second on a single core. Building this kind of system requires a disciplined approach to measurement, a focus on determinism and tail-events, and a good dose of mechanical sympathy.

Performance isn’t just important for the most latency-sensitive trading. We’ve built a distributed systems framework based on state machine replication (and inspired by the architecture of financial exchanges) which provides high throughput, low latency, and strong reliability guarantees to a wide variety of internal applications. The architecture of this system depends on a very high-performance backbone for sequencing, distributing, and filtering the transactions that drive these applications.

Accelerating machine learning

We do a lot of machine learning, and performance engineering is a critical part of that work. Making good use of our GPU clusters requires careful profiling and optimization of our training runs across the whole stack, from storage to network to host.

In most of the ML world, inference is largely a throughput problem, with responses aimed at human timescales. Because our models drive microsecond-scale trading, we need to architect for latencies far below those that are typical for ML workflows, while handling high-throughput market data. This leads us towards a variety of techniques, from writing heavily optimized CUDA code that stretches the bounds of what GPUs were designed for, to leveraging custom hardware, to writing our own compilers.

Learn more about Machine Learning at Jane Street

Industry-leading tools for performance debugging

Magic-trace

animated demonstration of magic-trace tool

Go to the magic-trace tool

Get magic-trace on Github

Memtrace

We also built memtrace, a tool for understanding memory usage and finding leaks. Memtrace builds on OCaml’s statistical memory profiler to get callbacks on GC events for a sample of a program’s allocations. The Memtrace viewer then analyzes these events and presents graphical views of them, as well as filters for interactively narrowing the view until the source of the memory problem becomes clear.

illustration with a blue background, featuring geometric patterns that evoke themes of technology and computing

Pushing programming language design for high performance

We write our lowest-latency software systems in OCaml, which combines a powerful type system with good and predictable performance and a low overhead runtime. Over the last couple of years, Jane Street has developed major extensions to OCaml, in particular:

The addition of modal types icfp24.sigplan.org PACMPL (ICFP) seeks contributions on the design, implementations, principles, and uses of functio... opens up a variety of ambitious features, like memory-safe stack-allocation; type-level tracking of effects, and data-race freedom guarantees for multicore code.
Unboxed types github.com Unboxed types in OCaml provides more control over the representation of memory, in particular allowing for structured data to be represented in a cache-and-prefetch-friendly tabular form.

Together, these features pull in some of Rust’s best features for writing high performance code, with a simpler and more ergonomic type-system that maintains the relative simplicity of programming in OCaml.

If the work detailed here resonates with you, you may find these opportunities particularly interesting:

Machine Learning Performance Engineer Low-latency Engineer Language and Runtime Engineer