Pyrefly: Type Checking 1.8 Million Lines of Python Per Second

Neil Mitchell

Transcript

00:05

Welcome everyone. Today, we have Neil Mitchell from Meta. Neil used to be a programming language researcher in a past life and then he worked on Meta’s New Build System, well, relatively new build system, Buck2, and now, he’s working on Python type checking and he’s going to tell us about Pyrefly, which is a type checker and language server for Python. We’re very excited to have you, Neil.

(audience applauding)

00:34

Thank you very much, Roman. As Roman says, I’m a programmer at Meta and for the last nine months or so, I’ve been working on the Pyrefly type checker, which I’m gonna tell you about today. So, I guess, first things, who here uses Python on a weekly basis?

00:51

Okay, it’s quite a lot for a traditionally OCaml shop. Who here uses types in Python on a regular basis? So most of their code.

01:00

I mean, most of you, I think you should use more types. So what some people are surprised to discover, especially given Python’s history, is that Python actually has types and has had types since about 2015, so over a decade now.

01:19

So if you write a little snippet like this, you can say that the X has type int and that the return type of this thing is a string. And we all know types, we all love types from our OCaml days. And now, we can write this in Python. But unlike most languages, in Python, if you write a type, it doesn’t really do anything. The interpreter just walks straight on by it without really having much effect. Some effect, but not much useful effect.

01:53

And so once you’ve got some types, there are two things you can do with them. You can do some runtime type checking. So, at runtime, when you start calling this function, you could check that the X really is an int and that the result really is a string. Or you could do static checking and check the entire program before you start. And in Python, because there isn’t one baked into the language, there are many versions of both. So typeguard, pydantic, enforce being the runtime ones, obviously, which come with a cost. And a plethora of static typing options. So there’s the one from Microsoft, the one from Google, and the two from Meta along with one from the Python Foundation. And Pyrefly is the one that I’ve been working on and that I’ll be talking to you about today.

02:40

The other thing to say is that Python types are actually a lot more complicated than you might think. They are in some ways more complicated than OCaml types, more complicated than even Haskell types in places, although not really dependent types. So they have generics, literals, higher orders, data class transforms, narrowing, it’s a very rich type language. And there are also standards documents and conformance tests for these, so it’s a useful part of the language.

03:10

So Pyrefly that I’m gonna talk to you about today is a Python type checker and language server. So it was built very much to be both of these because to do a good job at IDE stuff, you need to do a good job at type checking. It’s open source and follows the standard. So if you wanna get the source code, there’s a GitHub page linked to from pyrefly.org. It’s fast and parallel. We wrote it in Rust, so all the fearless concurrency stuff that Rust talks about, all good stuff.

03:43

The team that wrote it originally wrote the Pyre type checker. So we have a history of writing Python type checkers, this is our next one. And one of the things that sets Pyrefly apart from other Python type checkers is we really go heavy on inference, the idea that you should be able to write a program and put some types to give the type checker a hint where you’re going, but it should try and figure out the rest for you so that you know, what is quintessentially Python-like remains Python-like and it doesn’t become, you know, Java with all its type signatures and all.

04:20

So this is an example of the Pyrefly Sandbox that you can get to at pyrefly.org/sandbox, showing you the kind of things it does. So I’ve written my code, the bit in gray, it actually just inserts, I didn’t write that, it just inserts, it figures out the type and then inserts it in both the IDE and the sandbox. And so it has guessed that this expression always is a string so it can tell me and now it can type check using that information. I wrote Y equals an empty list and it’s gone, “Ah, you mean, a list of string, that must be what you mean.” And so when I do a test calling it with y[0], it knows it’s a string, it knows I’m expecting an int and I can get a type error saying, argument string is not assignable to parameter X with type int in function test. So it can give me a lot of hints about what’s gone wrong.

05:20

What’s interesting about this particular example is that in Python, if I pass a string here, it will work just fine. So I’ve used the types to restrict relative to the Python runtime errors, but usually, the types roughly failing the type check would lead to a runtime error, if you called it. And then because the results of test is always a string, I can get some nice auto complete. So you know, the standard IDE features. If you have any questions at any point, approach the microphone, I’m happy to take questions as we’re going, but microphone so that everyone on the video can hear it.

05:59

So how did we get to Pyrefly? So you might have heard of this thing called Instagram. It has 3 billion monthly active users and the server for Instagram is a really big Python program. About 20 million lines, over 20 million lines. And then at Meta across all of the developers, we have about 3,300 developers working on Python on any given day. So at Meta, we’re doing a lot of Python and we’re doing Python at very large scale. And when you’re working at very large scale, if you wanna do any meaningful change to the code without it all breaking and going all horribly wrong, a type checker is pretty important.

06:45

So in 2017, so this would be a couple of years after Python first got its basic forms of types, we started work on Pyre. MyPy was the standard solution in Python at that time coming out of a PhD thesis for how to check Python, but when you gave it, I’m sure it wasn’t 20 million lines at the time, but it was probably 10 million or so, you kind of very quickly ran into this performance and scale problems. So we turned to OCaml to write something much faster, much better than a Python type checker written in Python so that we could scale it to the kind of size we needed.

07:27

And if you’ve ever seen work from Meta, you’ll know that a lot of our stuff are PHP. We kind of took PHP and turned it into Hack by adding types. We took JavaScript and turned it into Flow by adding types and both of these type checkers share a bunch of DNA in common. And the first version of Pyre was an evolution of that codebase to take untyped Python and turn it into typed Python. So it’s very important to our developed flow. If we lost Python types, we would be in serious trouble.

08:04

But there are problems with Pyre and this is when I get to stand up in front of a Jane Street audience and say, OCaml wasn’t a very good choice for us. So OCaml wasn’t a very good choice for us at Meta and in particular, there were various things about OCaml that didn’t suit our environment, some of which were self-inflicted because we didn’t invest in enough time to make OCaml work in our environment and some of which were consequences of the evolution of OCaml. So Windows has always been a problem. I know there are ways to get OCaml working on Windows, but you know, it’s not a first-class citizen. Parallelism was hard, it’s easier now, but back when we wrote OCaml, the entire Pyre codebase uses multi-process for parallelism and then communicates through shared memory with lockless data structures, which is super cool, but not very friendly to debug or easy to develop.

09:03

It was a big barrier to our open source contributors. So while the other type checkers saw open source users, we would have one or two people who took enough time to learn OCaml so they could contribute to Pyre, but we didn’t have an influx of any particular developer group. And you know, I appreciate many of these problems have been solved at Jane Street, but they weren’t solved in 2017, which is when we started. So it wasn’t a great fit for us, but let’s not suggest that OCaml was the reason that Pyre failed by a long shot. Like there were many structural issues with Pyre, so it started out as an abstract interpreter, it used things like fixed points, it extensively desugared the code before it got to type checking.

09:48

So it was one of the weirdest type checkers I’ve ever seen because it didn’t have a symbol table or scope resolution, like fairly standard things you might expect. And obviously, once you’ve built that and got a lot of users on it, it’s very hard to add retroactively. It also started as a command line back in the day, before LSP was a thing, I guess in 2017. So no one thought about how you would make this a type checker and it started closed source and never focused on open source users. So we tried to fix that.

10:23

So for Pyrefly, last year, about August, two of us started prototyping and we came up with seven prototypes of various ways you could construct a Python type checker. Are you going to use a constraint solver and just basically desugar the program into constraints and then solve them at the end? Are you gonna do a more traditional subset approach? Are you gonna do abstract interpretation? Maybe, this was always the right answer and we just got it wrong the first time round, it turns out, no, it wasn’t the right answer but we explored basically, the entire space.

10:57

We were pretty convinced we wanted to use Rust because it’s cross-platform, fast, memory safe, all these nice properties and is starting to get traction in the Python open-source community. You know, the ideal language would’ve been to write it in Python so that we could benefit from Python developers and you know, dogfood it. But that didn’t seem feasible because it was just too slow. Python’s just too slow. And the multi-threading which is coming to Python, it’s in 3.13.0 with free-threaded Python, which is another thing Meta’s working on, but it just wasn’t there at the time.

11:31

In October 2024, we decided it was going so well, we’d actually turn it into a real project. Pyre2 turned out to be a horrifically bad name because there is a library called, RE2, the regular expression library and there is already a Python binding for that that was developed by Meta called, pyre2. So we had to come up with a new name and Pyrefly was decided upon and since then, we’ve basically been fleshing out a type checker, which takes a very long time. We released an alpha in PyCon and here I am talking to you about Pyrefly, we’re still developing it, it’s still in progress, but it’s good enough to give a go. Has anyone tried Pyrefly, out of curiosity? Wasn’t expecting anyone to, but cool, you can try it after this talk.

12:26

So I’m gonna briefly tour you through the Python type system because it’s maybe not the type system you expect. So part of the reason that the Python type system isn’t the type system you’d expect is because it comes at it from a different angle. It isn’t, let’s build a language with types. Let’s pick a really nice structure like Hindley-Milner and stuff and evolve these two together. It’s, we have billions of lines of Python out there, what can we do to take the concepts that people are already using and basically, graft types back onto these programs that were not written with types in mind.

13:07

So there are many reasons for types in Python. Not gonna go through them ‘cause I’m sure you all know, but yeah, faster inner loop, make corner cases safer. And honestly, the biggest reason I see in the Python code I’ve worked with is usually, the exception handlers will crash when executed because no one tests the exception handlers and then when the exception handler gets hit, everything blows up and you don’t even get the nice exception message. So things like type checking are super valuable for that. Also, with LLMs, if you want to vibe code something, the LLM, if you have a way to check that it didn’t do anything too horrifically weird, then a type checker is a really nice way to do this.

13:53

And one of the quotes I saw on Threads the other day is, “2025 is the year of type checking in Python, I’m so excited.” And yeah, maybe it is, maybe it isn’t, we’ll see.

14:08

So because of the reason we have Python types, the trend is actually for the Python type system to get more and more complicated every year with more and more advanced features, so that code that was previously untyped can now have a type written for it. And this is done through a conformance test and a detailed specification, but even though, there’s a detailed specification, there are lots of choices that the Python type system leaves to the implementer, for example, how much inference you do and how much inference you do is pretty key for which programs you accept and which ones you reject with the slight nuance that if you do more inference, you probably reject more programs ‘cause if you don’t do inference, you have to kind of guess and over-approximate.

14:59

So if you’ve never seen Python types before, these are the basic ones. You can define a class and that is a type, and you can write int, str, MyType and place it, nothing too crazy here. Where it starts to get more unusual is that you can have a literal type. So for int, str, bool, and bytes, you can write literals, which is like a subtyping relation. So for example, the file open call in Python takes one of r, w, a with a plus or not on it to determine what file mode you want. So you can type that using literal or you could say, you know, this function always returns 42. And then there’s also things like, you know, you need to do computation with literals, so if you write a test with an append, you can figure out str literal.

16:00

I said earlier that the Python interpreter doesn’t do much with types. The one thing it does do with types is evaluate them and actually, that can cause a surprisingly large amount of harm because it means that if you have an import that you only use for type checking, then you have to pay the cost of evaluating it and evaluating all its imports even if you don’t actually use the type checking. So one thing you can do is, you can write your types in quotes and you can also, write type aliases just as X = int, all these features are nice and cool, but then when you see something like Z = hello, how would you decide if that’s a type? Any ideas from the audience? Is that a type or not?

16:49

Could be. Could be, correct answer. It could be, you have absolutely no idea. And not only that, but like, you know, maybe in this file, it’s used as a value and in another file, it’s imported and used in a type position, you never know. So this means that typing in Python has kind of evolved and there is now a type keyword that you can use to explicitly introduce a type alias which, you know, obviously means, it must be a type and that’s nice and clear, but there are lots of things like this that are a nightmare to type in Python, but you know, we soldier on.

17:25

There are also unions, fairly standard in the kind of gradual typing way. And then there’s the Any type which essentially is a static type, but I don’t know which. So if you’ve gotten an Any, you can assign it to anything. Oh, and it must have been the type that you wanted, and you can assign anything to Any because it also must have been the type you wanted. So that’s quite cool. But it does mean that things like Any or bool, isn’t the same as Any because you are saying, well, there definitely might be a bool, whereas Any might not contain bool, because you don’t know what static type it is. It always warps my head a little bit, that one. And then there are also things like Never/NoReturn for what in Haskell, I would call, the void type for uninhabitable exception must be thrown and you can’t reach this point.

18:32

So if you have an unannotated variable, what is the type of it? Well, some systems will say Any, some systems will try and guess, in Pyrefly, we pretty much always try and guess something better than Any ‘cause Any sucks. But if you have an annotated argument type, Pyrefly will always infer Any, whereas some other systems might inline that function into the calling function and see what happens, right, for type checking.

18:58

We’ve got generics, everyone likes generics, but we also have this weird flavor of generics where you can declare them with a TypeVar. So if you write X = TypeVar (“X”) and it is important that the X on the left and the X in the string match or it’s an error, that declares a type variable such that every instance of that type variable is a fresh instantiation of it. So these two functions are equivalent. So you used to only be able to write the TypeVar version, now you can write the [Y] thing to essentially mean, for all Y. So generics, that’s nice. But unlike normal generics, you can write constraints that this generic must be one of int or bool or str, or you could write a constraint that this must be iterable, so kind of like type classes. With TypeVar, you can specify whether it’s covariant or contravariant or bivariant with the second syntax, it infers it, perhaps not correctly, but it does infer it. There is an algorithm for inferring it. So sometimes, you still need to use the TypeVar to specify the variance.

20:10

It’s got structural subtyping or kind of anonymous type classes, interfaces, anything that has the close method on it will be an instance of the SupportsClose protocol. And there are standard ones like these for iteration collections, all that kind of stuff.

20:32

It’s got overloads. So I can write, the not function in Python, obviously can negate anything but actually, if I’ve got a literal True, I know it will end up in a literal False. If it’s a literal False, it will end up in a literal True, otherwise, it will just go to a bool. So this is an implementation of the not function and essentially, I can give it some fake signatures and it will match the first one that hits and if it doesn’t, it’ll fall through to this.

Are these checked, like if you make a mistake creating the overloads? Are these checked? So no, it will check that this is compatible with all of these. So these must all be restricted versions of this, but it will not check that this thing is True. You’re just expecting to get it right.

21:25

Higher order functions, we have those, but we also, have those perhaps more complicated than any other programming language I’m aware of, so Python, you can pass positional arguments or keyword arguments, which obviously, complicates things. So you can write a simple callable, this says, I take an int and a str, and I produce a bool, so nice and easy. But I can also write this horrific thing which says, I take a function that takes an int and then some positional arguments and some keyword arguments and some return type and some positional arguments and some keyword arguments and then they can call it, giving it a first int. So this ** is a ParamSpec, meaning, I can have generic types over keyword and positional arguments and there’s various rules, but this type checks in Python. So pretty cool, I don’t think you can do this in many languages.

22:31

Yeah, and then there’s narrowing which means that often, well, flow typing, if I tell you the argument to this function is a string or a none, if I write if X, well, none is always false. So in this branch, X must be a string so I can call capitalize on it and it’s definitely not gonna crash, else, I’ll return none. So we would infer that this thing returns string as well. There’s a lot of this narrowing going on.

22:58

There’s also data class transforms, which are horrifically scary, but let you do basically, database ORM solutions. I am not gonna explain those because I don’t understand them. And perhaps, one of the biggest disadvantages of Python having kind of game types later, is you then need to give types for all the built-ins and not all these built-ins were done in a way that permits easy types. So there is a typeshed library that attempts to describe them, but it’s a little awkward. You know, it’s pretty good, but not everything is perfectly correct. For example, the dictionary collection doesn’t quite implement the mutable mapping trait, but we pretend it does ‘cause otherwise, nothing would really work. But yeah, a lot of work’s been put into getting the types good. So any questions on the Python types or I’ll go into the Pyrefly design.

23:52

Cool. So we wanted Pyrefly to be efficient in time and memory, which to my mind, especially when you’ve got 20 million lines of code, means you better be incremental. And because they work at Meta, if it didn’t work on Instagram, people were probably very, very sad. It needs to be flexible. So command line, IDE/LSP, integrate into Buck, integrate into MCP servers for, you know, Claude code, et cetera. Maybe, it should integrate into Dune and Bazel as well, probably good ideas. We very definitely wanted it to be hackable so we could have an open source community. So simple code, solid principles. We don’t want the world’s most advanced type system. We want one that a large number of people can contribute to.

24:44

Good user experience, of course. And perhaps, the weird constraint is we needed to be able to deal with circular import graphs. So in Python, you can have a module import another module that imports another module that ends up importing itself and as long as, when you evaluate it, nothing goes too wrong, like that’s okay. So as a consequence, like any large Python program that didn’t do something specifically to clean up its import graph, will have circular imports.

25:18

So you know, given these constraints and the fact that I’m a build systems guy, the conclusion was obvious, write a build system for building a Python type checker. And in particular, in order to get the performance, one of the decisions I made was, we should work at the file level. So if you have 20 million lines of code, that’s plenty of files. So if you have invalidations and dependencies at the file level, ah, that’s fine. If you do them at the function level or at the statement level or at the expression level, well, that’s probably a little bit more than we’re willing to deal with. So in order to scale large, we went for operate at the file level and for the ability to evict all data. So normally, if you’ve got a build system, it just adds data until, you know, you run out of RAM and you kill it. So for Pyrefly, it was important that it was able to evict data that it no longer needed.

26:20

One of the conclusions of working at the file level is that when you press a keystroke, we’re gonna invalidate that entire file which means, you need to be super quick when dealing with a file like on the order of a few milliseconds. Yeah, we are usually, less than a few milliseconds. So that was the design goals. How do we go about implementing it?

26:47

So the architecture is, there is a build system that I’ve drawn in orange that has all the dependencies, invalidation, concurrency, all that stuff. And then there are six steps that a file goes through, the code, the AST, the exports, the binding, the answers, and the interface. I’ll go through each of those steps now, but the idea is each file makes a linear sequence through this and the build system coordinates all the other files going through linear sequences of this at the same time.

27:19

So the code step, we read the code off the disk or for something like an LSP where it’s in an in-memory buffer, we take it from the LSP, but basically, get a string of text that is the Python code. For the AST step, we parse it into an AST, nothing too interesting here, yeah. And we use the Ruff parser from Astral, which is a Python parser that is error correcting, so it will always return you a valid syntax tree and a list of errors. So we can type check any Python code, no matter how many errors, as you’re editing the buffer. But I will say that parsing partially erroneous code is like the ultimate fuzzer. I once accidentally tried to do Pyrefly on the Rust codebase and crashed Pyrefly in a number of places because the error corrected version of the Python AST of our Rust code is not what you would expect.

28:15

So now things get interesting, we get to the export phase. So it’s very useful to know what each Python module exports so that when you see some import * statements, you can know what is essentially, map the name to a qualified name so you can look up its type in the appropriate module. If you’re doing everything by module, you need to know which module to start looking at. So in Python, modules always export all their imports apart from those that come from builtins that behave differently for reasons. Actually, this is not trivial because in Python, you can have cycles of modules that do import * on each other. So you actually need to take at least a fixed point to figure out what a module exports, which is pretty horrific. So for example, this, where does Z come from? Well, it might come from Module 1, it might come from Module 2, who knows? We have to figure this out. So for every module, we figure out what it exports. And you can see that at this point, we’re looking at the exports of other modules to figure out, so you can see where the build graph is going.

29:29

Next, we get on to the binding statements, which is really the key design of Pyrefly. So for any program, we take every statement and we put it into a key-value mapping where the key and the value expresses how to compute the type for this key, which might depend on the types produced by other keys. So in this phase, we deal with all flow control, all sequencing, the idea that one statement is before or after another is completely dealt with by this phase. And then in the final phase, we’re only really looking at the expressions and how they type check.

30:10

So here, I’ve got a little piece of code, so I’m not gonna go through it all, but you can see things like on line one, X = F function call. So well, the key that is X at line one, so basically, for each identifier, we mark which identifier it is. So it’s got a unique name. So X at line one, well, that is the expression, F applied to the empty list of arguments. Okay, and how would I figure out what F is? Well, there is a use(f@line1) key that will forward towhat is probably, a Def of F@line 72 or wherever it is defined or an import of F from some other module. And then here, we assign Y in one branch and Y in the other branch. So to bring back Y at the end, we have phi, so phi, the traditional control flow join operator, Y at line six is the phi of the Y at line 3 and the Y at line 5. So essentially, when we evaluate this code, when we evaluate the right-hand side, it will look up the phi, the value of Y on line three, the value of Y on line five, and the phi operation, which in most cases is just union. So you can see how and even things like if isinstance, well, this would be a narrowing. So in this branch, I know that X is an int and you can see that there is a use X, but with this narrowing, applying to it as isinstance. So we’ve taken away a lot of the complexity and desugared it to, you could almost say, a domain-specific language for type checking Python.

32:15

Sure, go for it. Yeah, yeah, quickly, make them happy.

32:22

What about the walrus operator?

32:24

Ah, the walrus operator. So for those who don’t know, the walrus operator in Python is X := something, but it is an expression, not a statement. And it is a very annoying operator because basically, like to figure out the exports of a module, you have to look through all the statements apart from the walrus operator ‘cause that also, defines values that might be bound. The walrus operator is a pain. But it’s not actually as painful as you might think. So for example, this isinstance is essentially doing a narrowing of the type. So the walrus operator kind of fits into this. We just generate bindings, we figure out, you know, you have to kind of first split out the and’s and or’s, so you get more linear code and then you can point at X := that will be an assignment, I think, yeah. It all works out, it’s not very pleasant. That and loops are the places where kind of like gets a little bit hairy, but it all works out. But yeah, the walrus operator has very little value in Python, but huge cost on every surface that implements it, so yeah, I’d rather we hadn’t done that. Cool, any other questions?

33:54

Cool. And then finally, we move on to answers. Where essentially we have the key-value mappings. We have expressions and most of answers is given the types of the sub-expressions. How do I create the type of this larger expression? So solve the bindings that were created. The place where it gets tricky is the fact that Python has a lot of cycles. It has the cycles in the import graph. But for example, if you’re doing return type inference, you might have a function that calls a function that calls a function. And so the return type is itself in a cycle. And to do this, we instead say that every answer has a thunk. So a value that we will figure out later and then we try and evaluate it and if we get into a cycle, we use Var and solve that later, and I’ll explain how that works in a second. But the answers is basically, special case for every single Python type and special built-in operator in Python, which is a lot. And this tends to be where we add things to when they add things to the spec.

35:06

And then finally, we take the types of the exported things and keep them and throw everything else away. So the interface is a subset of the answers ‘cause it uses less memory and that’s nice.

35:21

So let’s do some audience participation. What is the type of X in this first line?

35:29

List of literal 1.

35:31

List of literal 1, I heard. Do I have any other answers? Yay, better maybe. So if list was covariant, then it would be better, but because the list is invariant, actually whether it’s list of literal 1, that might stop you from doing things with it that you might otherwise want. So we say list of int because we generalize before we put in a container, but it’s not a wrong answer. What’s the type of X here?

36:04

List, int, or string.

36:07

I hear list, int, or string, that’s a valid answer in what we would actually produce. Any other answers?

36:13

List Any.

36:14

List Any is one. List object would also be because int and string have a single super type. So options. What about this? Like I wrote literal 1, so it’d be nice if this didn’t type check error because I decided this was a list of int, but it would’ve been list of int, if I didn’t have that. So this means, we have to do kind of context aware type checking. We have to push down the context. Yeah, fairly standard in type checking. And what’s the type of this?

36:51

No, no, so in Pyrefly, what we will do is, we will type check this with the hint, that literal 1 would be, oh no, should be list of literal 1. Yes, it would be an error. But if it was list of literal 1, we would pass down the context list of literal 1, and Pyrefly would go, “Ooh, you want literal 1 and I’ve got a 1, okay, let me not bother generalizing.” Cool. And then final line?

37:20

List Any.

37:20

List Any would be a perfectly reasonable type and is in fact, what most type checkers give you.

List of Var.

37:32

Yes, yes, list of Var. So the way this works in Pyrefly is, we essentially say that the first use of a generic variable must fully constrain it so we don’t essentially allow anything we make you pick soon. So the way this code works, so we’ve got X is an empty list and then we add(x,1) and the signature for this is, I take a list of T and a T and you know, put them together. So this would be the standard list append operation in Python without the self stuff. So the way we solve this, we first say, X is list of a fresh variable that we have made up on the spot, let’s call it, ?1. I think in Pyrefly, we use adds. So if you ever see an add sneak out in an error message, we try and avoid it, but that will be where it comes from. Okay, the add function is a callable that takes a list of ?2 and ?2 and produces None. And so the add function is really, a generic function. So it’s for all some type T this, and whenever we start to use A for all, we’ll immediately instantiate a fresh value for T as a variable.

38:53

So we’ve now got these two things. So now let’s try and kind of slot them together. So the first argument, X is list ?1, and we’ll say that should be the same as list ?2. We actually say, one must be a subset of the other, but for simplification. Okay, we’ve said list ?1 and ?2 are the same. So that means ?1 and ?2 must be the same because they’re invariant. Okay, now let’s try the second argument. Well, ?2, this ?2 must be the same as literal 1, so okay, now, ?2 must be int because for certain variables, we know they’re like to be used in container locations. So we generalize a little bit before we shove them in ‘cause no one really wants a list of literal int, but perhaps excluding, oh, and ?2 and ?1 are the same. So ?1 must be int. So we can conclude that on this line, you meant, list of int and this was the line that told us that. For an assignment like this, we’ll always look at the next use of X and if you haven’t pinned it by then, we’ll go Any, but otherwise, we’ll take one peek ahead. That make sense?

40:12

Cool.

40:15

Eviction so that we don’t blow too much memory, we try and throw things away. So I showed you those steps. We will actually throw away the AST, the bindings, and the answers after we have completed and got the interface, apart from if you’ve got the file opened in an IDE, in which case, we’ll keep everything to do better, code nav, go to definition. And the way it works is, we can answer a question about a type in another module, an import export pair, either by looking at the answers and forcing the thunk, or by looking at the interface. If we’ve got the interface, we’ll use the interface. If we don’t, we’ll use the answers. And the reason we might have to use the answers is ‘cause there might be a cycle and we can’t finish computing all the answers until after we’ve produced this one particular answer for you. So it is very simple because it goes through simple steps and this saves way loads of memory.

41:17

The next interesting thing is incrementality with cycles. So build systems do generally not like cycles and most of them will just crash, if you give them a cycle. Some of them will take a whole cycle and collapse it into a single node. But if you have a large Python codebase, it’s not uncommon to have tens of thousands of modules in a cycle. So that would suck ‘cause you basically, have to recheck everything. So what we do in Pyrefly, if A changes, what we do is we say, well, let’s assume that B and C don’t change and type check A assuming the interface of B and C and then see what happens. Most of the time, A doesn’t change, B didn’t change, C didn’t change, we get away with it. We have new types for the details of A, but the interface of A didn’t change so we know that it didn’t lead to B and C. If it turns out we were wrong, then we have to, okay, well, A changed the interface. So let’s type check B and C assuming that A doesn’t change again, okay, if that works out, great, we stop. If that doesn’t work out, then we go, okay, well, let’s compute A again because we know B and C are correct with respect to each other and see if that changes. And then we decide, well, actually that sounds pretty dumb ‘cause we might just get in a loop and we might be here all night. So as soon as we hit something that’s already been computed or I think it’s a depth of five, we just give up and say, all right, invalidate the entire cycle, we’ll just do the correct thing even though it’s slower on average. Ah, the pro is, usually only one file invalidates so it’s much quicker. And yeah, we can get like a few milliseconds type checking time on a huge cycle which you couldn’t get if you invalidated everything. The cons are sometimes, you end up with a little bit less parallelism and checking things more than once.

43:11

Performance, the interesting thing here is there’s approximately 10x more time. So you’ve got exports, then you’ve got bindings, which is about 10x the time and then answers, which is about 10x the time. So the interesting thing is, we do a lot of things in parallelism and we try and be careful with our threads. There are lots of things you should do for a type checker like this, like using a thread pool and everything. But ah, it’s only 35% of the profile, it’s actually not a huge deal yet. I’ll get round to it at one point, but it hasn’t been the most important thing yet.

43:48

We have transactions. So this is so that the build system can do many things at once. So you could be doing a find references at the same time as a change or a change on multiple threads. And so this is just so that you can keep the IDE ticking over while you’re doing complicated queries in the background. And we do this basically with database transactions. But yeah, obviously, our own custom version.

44:17

Extensibility, so the classic Python type checker is Mypy and it has a bunch of type extensions where people have added support for Django or for Shape types or other things to Mypy as Python extensions, which kind of insert themselves and change some of the typing rules for Python. We weren’t very keen on trying to put an API on something that’s such an evolving space. So we’ve instead put these inside. So Pyrefly natively supports the Pydantic extensions and we’re currently working on the Django extensions. So our idea is to basically, shove everything into Pyrefly, but it also means things like Buck integration are native to Pyrefly and other build systems coming soon, I think someone is working on Bazel and no one’s asked yet for Dune, but if they did, we would add it.

45:15

Recursion, this is an interesting one. We essentially put a thunk in place where we think we might have recursion and every thread knows which thunks it is evaluating and when it sees a thread, it is already evaluating on this thread. So this core stack, it will create a Var to slap into where the recursion goes and then at the end of calculating, at the very end, it will say, well, this Var must be the thing it actually returned. And this mostly works to solve recursive problems, it doesn’t always work, but it’s kind of a very nice use of the fact that we have Var and we’re abusing it for list inference, recursion, generics, and much besides. So it was a very powerful little primitive to add.

46:10

Pyrefly is open source. We’ve gained so much from open source. Python is open source, PyTorch is open source, there’s lots of good Python open source stuff. So we’ve licensed it under MIT and we are delighted to accept all pull requests and we have a very active issue tracker, we have lots of people submitting pull requests, we have a Discord. So one of the things like some projects start very much as internal company projects that then get thrown over the wall at some point. So we’ve been very careful to say, well, Pyrefly actually, we value open source users more than Meta internal users because actually open source is where most of our Python comes from. If we can make Python better for the world, that is probably the highest leverage thing rather than just focusing on Meta. So it’s a big difference from something like Pyre.

47:06

I was gonna talk about the journey of autocomplete, but I think we’ll skip that other than the fact that autocomplete is very cool and actually very well used by Python developers.

47:18

So why not Pyrefly? So it’s an alpha, it’s still got lots of bugs. You’ll probably find bugs, but we’ll fix them, or you can fix ‘em yourself. We welcome pull requests, but if you use Pyrefly, you can get a sticker and I’ve got some stickers over there, some of them are very cool. This one even glows in the dark and we even dogfood our own stuff. This is my son doing his Python homework using Pyrefly.

47:43

There is a team and we even had matching berets, although, I think I gave mine to my son. So there’s a team and a bunch of people, but actually, we have over a hundred contributors on Pyrefly. So this photo was about three months ago, maybe six months ago when we were a much smaller team. And so any questions?

(audience applauding)

48:10

And for the questions, could you use the microphone so it’s captured for the video. Go for it.

48:18

Yeah, I had a type inference question. If you could go back a few slides back to your example. Yeah, yeah. That one? Yeah. Yeah, I don’t think why, oh, you were listing a lot of stuff. You had literal 1 example. You were like, “Oh, it’s a variant so, like, you can’t have the type of list of literal 1, which is true. But like if you have variables, you’re doing something like unification, you have subtyping around the place, I presume you have variables with bounds.

49:01

Yes, we do.

49:02

So the correct type for this is like list of variable whose bounded by literal 1 and then you can unify later to find its one or you can unify later to find its int and like it’ll just work.

49:13

Yep, we could do that. I mean, this is very much a user experience question. If it instantiates to the wrong thing, that would probably be upsetting to the user, but yeah

49:27

But it’s the most general type like it’s It is the correct type.

49:29

It is. Yeah.

49:29

Yeah, yeah, yeah.

49:30

We see this like honestly when people write this, they all always mean, list of int.

49:38

Oh sure, sure. But when you give the most general type, you can’t get them get confused like ‘cause whatever they in their head thought the type was, you know, it’s an instance of the thing you’ve said it is.

49:47

Exactly, and I mean, the more common cases where you have a common base class and you have two subclasses of it and you’ll do like a list of A and B and there’s no real good way to figure out whether they meant, a list of unions of these two things or a list of the super type of this thing. And we at the moment, make the decision that you probably meant union.

50:11

Sure. I think, yeah.

50:12

But if we did the most general type, we could defer it till they actually used it. Which might be more enjoyable.

50:15

Ergonomic, I would’ve thought, although like, to make it all work very nice, you have to have some decent lattice structure, which I’m gonna guess, you probably don’t have.

50:24

We do have a decent lattice structure.

50:26

Like it has nice properties. Ah, yeah. I’m surprised that, what you get for by accident is gonna be.

50:34

There’s multiple inheritance, which you know, means that there’s not necessarily unique lattice structure, but it is possible.

50:41

Yeah. Okay, so then you should be able. Yeah, we should, we had honestly hadn’t thought of it, but we might try it.

50:53

I had one other question which was, at one point, you said, you need the answers, rather than the interface, for typing a thing. So then your invalidation algorithm seemed to assume that the interface was the only thing that could affect the others in the cycle and that seemed like a disconnect.

51:08

So the answers has answers to every key binding pair. The interface has only the ones that are exported. You can only look at the answers that are exported. So we have a type level bool that is exported, so only

51:24

Is it that you couldn’t have worked out what the interface was yet?

51:26

Yes.

51:27

But once you know what it is, that’s the

51:29

Yeah, you’re fine. Yeah, the interface and the answers will say the same thing, but because the interface doesn’t have thunks in, I might need to try to do some of this answers, some of this answers, then the other bits of this answers and now, I can convert them both to interface. But once they were both interface, I could have used them both if I’d had them.

51:47

Gotcha, all right.

51:51

You’ve elaborated a bit on what your constraint problems are like and I think you suggested some approaches to type inference in well, constraints solving. Maybe I missed this, but could you actually say what your sort of constraints solving approach is? Could you sketch?

52:06

Yeah, so broadly speaking, so we’ve got the Var thing, which is the unknown variables that substitute in, everything else is pretty much a subtype of constraint. So for example, when you call a function, the thing you’re passing in must be a subtype of the function argument. So it’s almost always this type must be a subtype of this type. And because you’ve got unions and other funky stuff that sometimes, turns into like well, if you’ve got this union must be a subset of this union, that’s where the constraints get a bit tricky. Some type checkers then add intersection types which are very natural to model narrowing, but then cause you to essentially have SAT solving for the interface types, the subset check. So yeah, does that answer your question?

53:01

I think it mostly does. So it’s sort of traditional, you’re manipulating subtyping constraints.

53:06

Yeah.

53:07

Okay. I do have a follow up. I have to think about it for a second.

53:14

Okay, sure. Any other questions? Yep, please.

53:17

How does it work in practice? Like do you find that a decent proportion of open source projects have code that satisfies the type checkers like requirements? Basically, how compatible is it with like the status quo of other projects? Like what type checkers they already use?

53:38

Pretty compatible. So we recently have been working with PyTorch to switch them from Mypy to Pyrefly, which is probably one of the hairiest oldest largest codebases. I think there were a few hundred typing laws we had to add while we were converting and there were an equal number that we got rid of while we were converting. So that’s a very large codebase with many thousands of locations. So we’re about on par with the other type checkers. Some things we accept, some things we don’t. One of the bigger problems is, there are a bunch of things that we reject ‘cause they’re very, very definitely wrong, but some other type checkers fail to spot and then it’s kind of like, well, we can’t be too compatible with them because we do still want to catch our errors. But like for most projects, you know, like in a day, you could make it Pyrefly likely, if you wanted to. Appreciating that some projects are very big and much harder.

54:51

Anything else? Well, thank you very much for having me. It’s been a pleasure. (audience applauding)

55:00

Thank you very much, Neil. That was great. We all enjoyed this, and now, food.

Pyrefly: Type Checking 1.8 Million Lines of Python Per Second

Transcript

The next great idea will come from you