Production Engineering at Jane Street

About the role

Jane Street has an extensive tech landscape. By and large, our software engineers are responsible for both the development of their applications and ongoing production support. But some applications, whether due to strict response requirements or some other characteristic of the production environment, require dedicated personnel to provide support.

Enter production engineers.

Production engineers wear two hats: we divide our time between a support rotation in which we actively resolve live issues, and project work to improve our production infrastructure. (You can read the job description to learn more.) Below, we’ll describe both roles in more detail to help you decide whether it’s something you think you’d enjoy.

The support rotation

This is the primary service that production engineers provide the firm. We staff rotations for a given suite of applications in order to provide timely resolution to whatever operational issues come up throughout the trading day. We spend between a third to a half of our time in these rotations, typically in weeklong segments to remain uninterrupted during our project work.

At its most exciting, those issues are novel incidents, possibly arising as a result of a bad deployment, a new kind of trading, or something changing in the external world. Resolving these incidents requires a cool head, the ability to coordinate among multiple internal and external parties, and excellent debugging and problem-solving skills.

And at its most mundane, those issues are known bugs that we just haven’t fixed yet because we felt we had the capacity to take on some additional toil in order to get new features or new trading flows enabled. As production engineers, we’re empowered to find solutions to eliminate such toil effectively for the team.

Things we enjoy about support

Quick wins: Writing software can take weeks of work to produce an unobservable gain. On the other hand, resolving production issues produces lots of observable gains really quickly — getting these types of wins can feel great.

Debugging can be fun: Debugging hairy problems, digging into unfamiliar systems, combining multiple incomplete data sources to build a full story… if that all sounds like fun to you, then you might enjoy being a production engineer!

Great learning: You’ll get exposure to a wide variety of Jane Street systems and see how different teams do things, which can inform how you build your own software.

It’s lively: Our work is definitely not sitting down with your headphones on, hammering out code. While on the rotation, you’re more likely to be up on your feet, coordinating with multiple teams (maybe including external parties) to resolve an issue.

Insight into the business: Understanding how the software is used gives you a very close look at the business impact as well. This allows you to pick up a lot of context about how Jane Street’s business functions.

Things to keep in mind

We enjoy support, but it’s not for every engineer. Specifically, you should know that your time on support can be:

Interrupt driven: We don’t expect people to get any project work done while they participate in the rotation, because the frequent interruptions make that impossible.

Repetitive: We mostly try to constantly deal with novel issues — if we’re handling the same issues that we dealt with a year ago, then we aren’t doing a good enough job automating things. But inevitably, issues will recur. When that happens too often, it can be frustrating to deal with an issue that you know how to resolve and just haven’t had the time to fix yet.

Sometimes stressful: Dealing with incidents can be fun, but there’s no denying that it’s stressful to have production trading potentially depending on your ability to resolve a live issue.

These aren’t negatives to everyone. If you enjoy doing “real-time” work with an element of pressure (like working in an emergency room, performing in a stage play, or competing in organized sports), you’ll probably enjoy the variety and excitement of support (we all do)!

Project Work

When we’re not participating in the rotation, we work on projects to improve the reliability of the software we support. There is too wide a variety of work to enumerate here — for example, sometimes these projects involve writing a lot of code, and sometimes none at all — and different production engineers will tend to work on different kinds of projects. But to give you a taste, here are some of the kinds of projects production engineers work on:

Incident follow-through

Owning a postmortem to resolution often stretches beyond the support shift. Coordinating between multiple teams to figure out which systems should be responsible for which changes, which action items are most important, and so on, can all take a long time, but is crucial if we aren’t going to just see the same incident recur in a few weeks.

And it’s similarly crucial to wrangle the long tail of smaller issues, ensuring that small “one-off” tickets don’t get dropped. This tends to require good project management skills, system understanding, data analysis, and the ability to recognize themes among seemingly unrelated issues.

Application work

Many production engineers work closely with application developers on the specific applications that we help support, but with a focus on improving interactions with the team’s systems in production. This most obviously includes the reliability and supportability of the applications and can extend to areas such as user experience.

The work itself, whether it’s improving noisy alerts or making recovery from incidents easier or something else, brings obvious immediate benefits.

But we think it brings longer-term benefits as well. Production engineers, by supporting a wide variety of applications, often have a broader perspective than individual application developers on what makes applications easy to support in production. Conversely, application developers typically understand their own applications in a deeper way than production engineers otherwise would. We believe working together is the best way to build systems that share the best practices developed across the firm, while still handling the unique challenges posed by the specific problem at hand.

Infrastructure Tools

Production engineers are also responsible for building a lot of the tooling that’s required to run an effective support rotation. We work on alerting libraries, toil tracking and visualization, logging libraries, and more. Proofs-of-concept for individual teams are often upstreamed into services that are used broadly across the firm.

Improving our response

We mostly try to make sure we don’t need to respond to things, but we also spend effort making sure that our response is excellent. Production engineers run incident training drills, improve documentation, run boot camps and teach-ins for the firm as a whole, train new joiners to the team on the production environment, and more.

Other operational work

There’s also some operational work that doesn’t relate to ongoing production support. We may do things like manage deployments, configure systems for new trading, or coordinate and follow through with external counterparties.

What our work is not

In the wider world, the term “production engineering” is sometimes used to refer to a broader concept, including systems work like performance tuning of Linux platforms, data center management, and so on. If you’re passionate about those topics, you may be interested in applying to our Linux Engineer role.

1. Just like most of Jane Street, we write almost all of our software using OCaml. And just like pretty much everywhere else at Jane Street, we don’t expect you to know any OCaml coming in — we’ll teach you what you need to know.

The next great idea will come from you