PyO3 0.21 is inbound with a pile of performance improvements (up to 25% in many cases) and new features such as early async support. The biggest piece is PyO3's new "Bound" API. This post is the first of a planned series to cover the design journey that led us to PyO3's upcoming 0.21 release.
I am aiming for PyO3 0.21 to be released within the next two weeks - so by March 17th the migration outlined here should be underway. We may start with a beta release before continuing with the final release next week, for anyone who wants to be an early adopter.
This post should probably be titled "Replacing PyO3's API without breaking everything downstream immediately". In future PyO3 releases we will require PyO3 users to update code. The point here is that we've tried very hard to add the new API while breaking as little code as possible.

Replacing PyO3's API. Image credit Dall-E.

PyO3 is the de-facto library to write Rust code that interoperates with Python. It is used in production worldwide as part of Python packages that total billions of annual downloads.

In my "Hello World" post I wrote about how the most important thing I see for PyO3 in the short term is completing PyO3's new "Bound" API and releasing it as part of PyO3 0.21. In this post, I'll go into detail about what that new API is, why the existing API wasn't good enough, and how we will all migrate to the new API.

This new "Bound" API is opt-in for PyO3 0.21 and will gradually phase out and replace the old "GIL Ref" API over the next few PyO3 releases. This migration will dominate these releases, so I want to take some time to explain to PyO3 users why this is happening and why myself and fellow maintainers made the choices we did.

I need to say a huge thanks to all the PyO3 maintainers who have helped make this release happen, and in particular to @adamreichold, @Icxolu and @LilyFoote.

Introducing PyO3's new API

Let's first make an important point: much of PyO3's API is not changing. Lots of PyO3 code will continue to work unchanged and will not ever need to be adjusted during this migration. Despite this, there is a fundamental rework afoot with respect to how PyO3 lets Rust code interact with Python objects.

In PyO3 0.20, the main type to interact with a Python object is PyAny. This is always used as a reference &'py PyAny, which we call a "GIL Ref" (I'll explain this name, the reference and its lifetime 'py in the next section). There are also types for specific Python types, such as &'py PyList. Finally, Rust code can define new Python types using PyO3's #[pyclass] procedural macro. For a new Python type Foo, Rust code also refers to this using a GIL Ref &'py PyCell<Foo>. (You can read about pyclass and PyCell in PyO3's documentation.)

In PyO3 0.21, we will introduce the Bound<'py, T> smart pointer. Instead of the GIL Refs above, we can use Bound<'py, PyAny>, Bound<'py, PyList>, and Bound<'py, Foo>. Doing this allows one fundamental improvement over the GIL Refs: we bring Rust's notion of ownership out of PyO3's internals and directly into PyO3 users' code. This has several key knock-on effects:

It improves CPU performance: for example, iterating a Python list is approximately 25% faster with the Bound API compared to the GIL Ref API.
It improves memory performance: for example, the GIL Ref API would have linear memory growth when iterating a Python list (yes, this really was the case, and I knew that this would need solving before PyO3 1.0). The new Bound API causes no memory growth from iteration.
It gives users fine-grained control over how long Python objects exist.

Giving control of ownership does increase complexity for PyO3 users compared to what they had previously with the GIL Ref API. I think this is more than justified by the fact that good understanding of ownership is a core part of Rust anyway, and the performance wins are significant.

Keeping compatibility

Given the huge amount of effort that could be pushed onto PyO3 users by replacing one set of types with another across a whole codebase, it was important for us to keep to two key points:

Existing code for PyO3 0.20 should continue to work without edits in PyO3 0.21.
The amount of changes to update code to use PyO3 0.21's new Bound API should be as small as possible and easy to review.

We managed to deliver the first bullet by keeping PyO3's whole existing API unchanged, with only a few tiny exceptions where we judged the impact on user code to be minimal or nonexistent. This should allow users to update to PyO3 0.21 straightforwardly, leaving them free to then upgrade their code to the new Bound API incrementally at a pace that suits them.

To illustrate this API change, let's write a hypothetical PyO3 function map_with_index that takes a Python list of values, for each value calls a callback function containing the value plus its corresponding index in the list, and returns a list of the results of each callback function call.

In PyO3 0.20, we could write that function like this:

#[pyfunction]
fn map_with_index<'py>(
    list: &'py PyList,
    callback: &'py PyAny
) -> &'py PyList {
    // Get the token that proves we can interact with Python objects safely
    let py: Python<'py> = list.py();

    // Create a new list by iterating over the original list and calling the callback
    // with the index and the item
    PyList::new(
        list.py(),
        list.iter().enumerate().map(|(i, item)| {
            // Create a tuple of the index and the item
            let pair = PyTuple::new(py, [i.to_object(py), item.to_object(py)]);

            // Call the callback with the pair as a single positional argument
            //
            // NB real code should not use unwrap and instead handle errors,
            // but this is just a simple example to illustrate the API change
            callback.call1((pair,)).unwrap()
        }),
    )
}

Using this function from Python with the values 1 through 4 and an identity callback looks like this:

map_with_index([1, 2, 3, 4], lambda x: x)

In PyO3 0.21, this code can opt-in to the new API by replacing the GIL Refs with Bound values, and replacing PyList::new and PyTuple::new with new_bound constructors instead (see lines 3, 4, 5, 11, and 15 below):

#[pyfunction]
fn map_with_index_bound<'py>(
    list: Bound<'py, PyList>,
    callback: Bound<'py, PyAny>,
) -> Bound<'py, PyList> {
    // Get the token that proves we can interact with Python objects safely
    let py: Python<'py> = list.py();

    // Create a new list by iterating over the original list and calling the callback
    // with the index and the item
    PyList::new_bound(
        list.py(),
        list.iter().enumerate().map(|(i, item)| {
            // Create a tuple of the index and the item
            let pair = PyTuple::new_bound(py, [i.to_object(py), item.to_object(py)]);

            // Call the callback with the pair as a single positional argument
            //
            // NB real code should not use unwrap and instead handle errors,
            // but this is just a simple example to illustrate the API change
            callback.call1((pair,)).unwrap()
        }),
    )
}

By making these relatively simple changes, this function immediately gets the ~25% speedup of the new API. Here's an ipython session using %timeit to demonstrate this:

#
# Some setup (I created a Python module called `pyo3_scratch`
# with the code snippets above)
#
In [1]: from pyo3_scratch import map_with_index, map_with_index_bound
In [2]: values = [1, 2, 3, 4]
In [3]: identity = lambda x: x

#
# Timing measurements with list of length 4
#
In [4]: %timeit map_with_index(values, identity)
341 ns ± 1.07 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [5]: %timeit map_with_index_bound(values, identity)
244 ns ± 0.593 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

#
# Timing measurements with list of length 40,000
# to show this performance gain scales with input
#
In [6]: values_big = values * 10_000
In [7]: %timeit map_with_index(values_big, identity)
3.02 ms ± 10 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [8]: %timeit map_with_index_bound(values_big, identity)
2.4 ms ± 6.05 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

For both the short and long lists, we see that map_with_index_bound performs around 20-30% faster and with around 40% lower variation in the timing. Very pleasing to see, given that only 5 lines of code needed some very minimal edits!

Why does this change improve so much

It's fair to ask what was the fundamental design problem with the GIL Ref API that meant we could not fix the performance issues without replacing the whole API. The answer lies in the lifetime 'py that those GIL Refs had (e.g. &'py PyAny). For the sake of this post let's just take lifetimes as a Rust syntax used to describe how long something is valid (there are many good posts out there explaining lifetimes in detail, such as this one). In the case of our GIL Refs, our lifetime 'py is overloaded to mean two things:

The lifetime 'py that Rust code can safely interact with the Python interpreter.
The lifetime 'py that Rust code owns a reference to the Python object in question.

The first bullet, safe interaction with the Python interpreter, is fundamental to PyO3. We will cover that in detail in a later post in this series when we discuss how PyO3's API might evolve more in the future. For now, it's the second bullet, Python object lifetimes, which concern the GIL Ref API, and we will explain more in a moment.

It is a problem that this 'py lifetime is overloaded to mean two things: it reduces expressiveness of PyO3 code by tying two ideas into one. This takes control away from PyO3 users and adds complexity to PyO3's internals to bind these two meanings together. Even worse, we discovered recently that PyO3's implementation of this made a fundamental assumption which the Python gevent package breaks, leading the combination of PyO3 and gevent to be unsound. If the previously mentioned reasons for the new Bound API were not compelling enough, it is critical that we remove the soundness hole in PyO3 0.20's GIL Ref implementation.

To explain how the overloaded 'py lifetime reduced expressiveness of PyO3 user code, let's start with a PyO3 0.20 function which just calls the Python repr() function and converts that into a Rust &str, using GIL Refs. We'll rewrite it using PyO3 0.21's Bound API and see how the extra expressiveness of the Bound API gives user code control of ownership.

Here's the PyO3 0.20 version using GIL Refs:

fn object_repr<'py>(obj: &'py PyAny) -> PyResult<&'py str> {
    let py_string: &'py PyString = obj.repr()?;
    py_string.to_str()
}

This code uses PyO3's repr() function to call obj's implementation of repr(), getting a a Python str object as a &'py PyString GIL Ref called py_string. Next, it calls .to_str() on that GIL Ref to read the UTF8 data directly out of that str without copying. In Rust terminology, the returned &'py str is borrowed from py_string. This borrowing is valid because all the py_string and the returned &'py str both have the lifetime &'py. The Rust compiler understands that the Python object py_string will keep the memory which the &'py str is reading valid for the whole lifetime py.

In PyO3 0.21, we could try to migrate the code to the Bound API by replacing &'py PyAny with Bound<'py, PyAny> and the same for &'py PyString, like so:

fn object_repr_bound<'py>(obj: Bound<'py, PyAny>) -> PyResult<&'py str> {
    let py_string: Bound<'py, PyString> = obj.repr()?;
    py_string.to_str()
}

This time, it turns out the migration is not so easy, and we immediately hit a borrow checker error:

error[E0515]: cannot return value referencing local variable `py_string`
 --> src/lib.rs:3:5
  |
3 |     py_string.to_str()
  |     ---------^^^^^^^^^
  |     |
  |     returns a value referencing data owned by the current function
  |     `py_string` is borrowed here

What changed? The answer is that Bound<'py, PyString> removed that second meaning from the 'py lifetime; the Python str named by py_string now only lives as long as the binding py_string does in this Rust code. As soon as the function ends, py_string falls out of scope and so the Python str gets deleted. Therefore it's not valid to return data borrowed from it out of the function, as the compiler reports above.

It turned out that the GIL Ref API's first meaning of the 'py lifetime, the lifetime that Rust code can safely interact with the Python interpreter, is usually a very broad lifetime which can span a lot of PyO3 user code. By having that second meaning, object ownership, tied to the 'py lifetime too, PyO3 was forced to have an internal mechanism keep Python objects alive far longer than they needed to be. This created the CPU and memory overheads that the Bound API removes.

What about the solution for the compile error above? Well, the PyO3 user now faces the same choices for ownership that they would in any other Rust code.

They could copy the text data out of py_string and into an owned Rust String:

fn object_repr_bound_owed<'py>(obj: Bound<'py, PyAny>) -> PyResult<String> {
    let py_string: Bound<'py, PyString> = obj.repr()?;
    py_string.to_str().to_string()
}

Or they could return the Python str directly, and skip converting it to a Rust form at all:

fn object_repr_return_py_string<'py>(obj: Bound<'py, PyAny>) -> PyResult<Bound<'py, PyString>> {
    obj.repr()
}

Or they could find other ways to store py_string for an appropriate duration. PyO3 0.21 is exploring a PyBackedStr type which may help in these cases. I won't link it here as its design is not yet finalised.

Whichever is the best choice will depend on the users' program. By introducing this new Bound API we push an additional concept onto users, many of whom are Python developers trying Rust for the first time. We'll try our best to help teach ownership through PyO3's documentation. By giving them this, we help them to think in idiomatic Rust and give their programs the best performance.

Why the "Bound" name

Now that we've explained the ownership and in particular how the 'py lifetime had two meanings in the GIL Ref API, we are ready to justify why we picked the name "Bound".

It's because of the one meaning that the 'py lifetime will continue to keep in the Bound API: the lifetime for which Rust code can safely interact with Python objects. When we write a Python object (of any type) as Bound<'py, PyAny>, in PyO3 we say that we have bound this object to the lifetime 'py in order to interact with it safely.

PyO3 already has an existing "unbound" smart pointer, Py<T>, which is just like Bound<'py, T> except it does not have this binding to the 'py lifetime. Without this lifetime binding on Py<T> we cannot call methods such as .repr() or .to_str(), as doing so would incorrectly break requirements of Python's C API which PyO3 and Rust use to interact with Python objects. There are good use-cases for Py<T> covered in PyO3's documentation, but it's outside the scope of this post to explain them now.

I'm sure that given infinite time someone would have likely come up with an even better name suggestion than Bound. As it is, I am very happy with the choice we have made.

How this migration will unfold

The migration of PyO3 fully to the new Bound API is expected to go in the following stages:

PyO3 0.21

PyO3 0.21 introduces the new Bound API as an opt-in migration, allowing users to update at their own pace to gain the performance and safety improvements:

We introduce new _bound method and macro variants across a large set of PyO3 APIs to enable this opt-in migration, such as PyList::new_bound and PyTuple::new_bound that we saw above.
The existing GIL Ref API is deprecated, but to help users manage the huge number of deprecation warnings we're adding a gil-refs Cargo feature to PyO3 0.21 that turns off these warnings. We recommend users enable this feature until they have completed migrating their code.

PyO3 0.22

It is expected that in PyO3 0.22 we'll push a little bit harder for users to migrate:

The GIL Ref API will only be available with the gil-refs Cargo feature enabled.
Even with the feature enabled, all uses of the GIL Ref API will emit deprecation warnings.

PyO3 0.23

PyO3 0.23 is expected to completely remove the GIL Refs API and the gil-refs feature.

Where previously we made _bound variants of functions to keep compatibility, we will now be able to reuse the old names. The _bound variant names will be deprecated. Users should be able to easily simplify their code by removing the _bound portion of these names. For example, PyList::new_bound can be renamed back to just PyList::new.

From this point, PyO3 can begin to explore a few small optimisations made possible by the Bound API which would require small breaking API changes. The previous need for compatibility with the GIL Ref API prevented us from making these in PyO3 0.21. These optimisations wouldn't have such a drastic effect on user code, but they might be nice follow-ups.

Summary

In this post I've attempted to justify why PyO3 needs to undergo a significant API migration, and how we have done our utmost to make it easy for users to keep up with PyO3 in this upgrade. By doing this we remove overheads and issues in PyO3's existing API which I considered must do actions before a PyO3 1.0 could ever be considered.

We spoke about the fundamental limitation of the GIL Ref API and how the Bound API hands users direct control of ownership, avoiding this limitation by doing so.

In the next post, we're going to cover in detail how the GIL Ref API worked and why that was problematic. We'll go from there into the design choices that myself and fellow PyO3 maintainers explored while finding a solution that could work for us all.

David Hewitt

Replacing PyO3's API without breaking everything downstream

davidhewitt