ADVANCED PYTHON PROGRAMMING

Descriptors Aplenty

This time, we’re going to use descriptors for all sorts of cool stuff, from properties and class methods to cached and typed attributes.

13 min readApr 22, 2020

Last time, we saw how to use descriptors to implement basic Python functionality—namely, methods. This time, we’ll expand our scope to implement more advanced features like properties, class methods, and some cool tricks that aren’t available by default.

Proper Properties

Let’s start with properties; but first, let’s come up with a background story for motivation. Say we’re developing a class that encapsulates a 2-dimensional point, and we start with an implementation that uses cartesian coordinates:

class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y

Anyone who’s ever programmed C++ or Java would tell you this is a bad idea—you’re exposing your implementation, and effectively committing to it by making it part of a public interface. Better do something like this:

class Point:    def __init__(self, x, y):
        self._x = x
        self._y = y    def get_x(self):
        return self._x    def set_x(self, x):
        self._x = x    def get_y(self):
        return self._y    def set_y(self, y):
        self._y = y

This is much more cumbersome, sure—but at least this way, if the day comes that you choose to switch to polar coordinates, your interface would still be intact—you could simply rewrite (get|set)_(x|y) to do the necessary math:

class Point:    def __init__(self, r, t):
        self._r = r
        self._t = t    def get_x(self):
        return self._r * math.cos(self._t)    def set_x(self, x):
        y = self.get_y()
        self._r = (x**2 + y**2)**0.5
        self._t = math.atan2(y, x)    def get_y(self):
        return self._r * math.sin(self._t)    def set_y(self, y):
        x = self.get_x()
        self._r = (x**2 + y**2)**0.5
        self._t = math.atan2(y, x)

But that’s… well, bad; let’s fix it with descriptors! We start simple:

class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y

And if the day does come when we decide to switch to polar coordinates, we’ll simply replace x and y—from regular attributes, to dynamically-computed properties. Whenever the seemingly attribute-like x or y are accessed, we’ll actually run some quick code behind the scenes, do the same math as before, and produce a result; no one will suspect a thing. But how can we run some quick code when an object is accessed through another object?

class Property:    def __init__(self, f):
        self.f = f    def __get__(self, instance, cls):
        if not instance:
            return self
        return self.f(instance)

This simple Property class takes a function—so it’s intended to be used as a decorator class—and stores it for later. When it’s accessed through a class, it returns itself—that’s just good practice for descriptors that don’t have anything better to do; and when it’s accessed through an instance, it actually runs the function from before, with that instance as its argument—so effectively, executes it as a method! Then, you can do this:

class Point:    def __init__(self, r, t):
        self.r = r
        self.t = t    @Property
    def x(self):
        return self.r * math.cos(self.t)    @Property
    def y(self):
        return self.r * math.cos(self.t)

And even though we no longer store the cartesian coordinates as they are, it’ll look like we do—the x method will be replaced with a Property object, which will invoke it whenever it’s accessed (without any parenthesis); same goes for y.

>>> p = Point(1, math.pi / 2)
>>> p.x
0.0
>>> p.y
1.0

We can even support assignment, but that’s a bit trickier. See, we haven’t actually defined a descriptor—we snuck it in cleverly by using it as a decorator class, which switched the original function with its new instance. To communicate both getting and setting logic, we’d have to give up our sexy syntax:

class Property:    def __init__(self, getter, setter):
        self.getter = getter
        self.setter = setter    def __get__(self, instance, cls):
        if not instance:
            return self
        return self.getter(instance)    def __set__(self, instance, value):
        self.setter(instance, value)

Instead, we’d define both functions, and pass them to a descriptor:

class Point:    def __init__(self, r, t):
        self.r = r
        self.t = t    def get_x(self):
        return self.r * math.cos(self.t)    def set_x(self, x):
        y = self.get_y()
        self.r = (x**2 + y**2)**0.5
        self.t = math.atan2(y, x)    x = Property(get_x, set_x)    ... # Same for y

Now, we can do both p.x and p.x = 1, and both will invoke actual code. But that’s a lot of hassle—not much better than the Java situation, really (except we can stick to a simpler design and not worry about future compatibility, as long as we don’t refactor our implementation too radically). The decoration trick was really much nicer—if only there was a way to use it again.

class Point:    def __init__(self, r, t):
        self.r = r
        self.t = t    @Property
    def x(self):
        return self.r * math.cos(self.t)    @x.setter
    def x(self, x):
        self.r = (x**2 + self.y**2)**0.5
        self.t = math.atan2(self.y, x)    ... # Same for y

That takes a moment to digest. We decorate the function x with the Property class, which replaces it with a Property instance; and then we use that same instance, which has a method called setter, to decorate another function, also named x, whose signature and code are meant for assignment. Here’s how it’s done:

class Property:    def __init__(self, f):
        self.getter = f
        self.setter = None    def __get__(self, instance, cls):
        if not instance:
            return self
        return self.getter(instance)    def __set__(self, instance, value):
        if self.setter is None:
            raise AttributeValue('not settable')
        self.setter(instance, value)    def setter(self, f):
        self.setter = f
        return self

First, we decorate the getter—and __get__ works like before. But then, we also have __set__, which invokes the setter, or raises an error if it isn’t defined. To define it, you’d have to use its setter method to decorate another function, which is then collected and replaced by the same descriptor, which can now handle both cases.

As weird as it sounds, it’s important to stick to the same name in both the getter and the setter, because decoration is really just syntactic sugar for:

class Point:    def __init__(self, r, t):
        self.r = r
        self.t = t    def x(self):
        return self.r * math.cos(self.t)
    x = Property(x)    def x(self, x):
        self.r = (x**2 + self.y**2)**0.5
        self.t = math.atan2(self.y, x)
    x = x.setter(x)    ... # Same for y

If we name the setter function z, we’d get z = x.setter(z), ending up with both p.x and p.z referencing the same property.

Having implemented it all ourselves, it’s time to disclose Python actually comes with the built-in property class, which is used exactly like ours:

class Point:    def __init__(self, r, t):
        self.r = r
        self.t = t    @property
    def x(self):
        return self.r * math.cos(self.t)    @x.setter
    def x(self, x):
        self.r = (x**2 + self.y**2)**0.5
        self.t = math.atan2(self.y, x)    ... # Same for y

It even has a deleter, if you want some custom __delete__ logic.

Classy Methods

To continue our story—even with x and y as properties, switching from cartesian to polar coordinates changes the constructor; and there’s no way to unring that bell. We could, of course, try extending it in some horrible way:

class Point:
    def __init__(self, x=None, y=None, r=None, t=None):
        if (
            x is None and y is None
            and r is None and t is None
        ) or (
            x is not None and y is not None
            and r is not None and t is not None
        ):
            raise ValueError('provide either (x, y) or (r, t)')
        if self.x is not None:
            r = (x**2 + y**2)**0.5
            t = math.atan2(y, x)
        self.r = r
        self.t = t

But that’s… yeah. In other languages, this issue is addressed with method overloading (although in this particular case, both signatures would have two floats, so there’d be no way to tell them apart). But speaking of other languages—both C++ and Java provide a way to define static methods, which operate on the class level rather than the instance level, and can provide different ways to instantiate the object, in what’s called static factory methods. In Python, we can emulate it by making a decision—for example, “the implementation is polar, and so is the constructor”—and directing people to never construct an object directly, using various factory functions instead:

class Point:
    def __init__(self, r, t):
        self.r = r
        self.t = tdef from_cartesian(x, y):
    r = (x**2 + y**2)**0.5
    t = math.atan2(y, x)
    return Point(r, t)def from_polar(r, t):
    return Point(r, t)

Now, doing p = Point(1, math.pi / 2) would be wrong—at least, morally wrong, since it’s not actually enforced. Instead, you’d have to do something like p = from_polar(1, math.pi / 2) or p = from_cartesian(0, -1). It’s alright—but it would’ve been neater if we could have these factories tucked away in the class: Point.from_polar and Point.from_cartesian. The problem is, those methods don’t receive self—they’re invoked through the class before any self is created, after all—so if we put them in the class, and someone accidentally accesses them through an instance, things are gonna break. Unless…

class StaticMethod:    def __init__(self, f):
        self.f = f    def __get__(self, instance, cls):
        return self.f

… We replace the functions with a StaticMethod object (again, using decoration), which would shield them from all this descriptor nonsense: a function will remain a function, not bind to anything or have any automatic arguments. As simple as it is, it works:

class Point:    def __init__(self, r, t):
        self.r = r
        self.t = t    @StaticMethod
    def from_cartesian(x, y):
        r = (x**2 + y**2)**0.5
        t = math.atan2(y, x)
        return Point(r, t)    @StaticMethod
    def from_polar(r, t):
        return Point(r, t)

You can now call both Point.from_cartesian and p.from_cartesian—and they’d work exactly the same. However, we can do even better: instead of hardcoding Point in static methods, which wouldn’t work for subclasses, we could pass in the class dynamically—we get it in the descriptor, after all—by making it, say… the first argument, automatically?

Imagine a new kind of method—class methods, if you will—that automatically get the class (let’s call it cls) instead of self as their first argument, so that they can operate on the class level rather than on the instance level. First, let’s have:

class ClassMethod:    def __init__(self, f):
        self.f = f    def __get__(self, instance, cls):
        return BoundClassMethod(self.f, cls)

This would wrap up a function with a ClassMethod object that, upon access, will return a class method bound to that class:

class BoundClassMethod:    def __init__(self, f, cls):
        self.f = f
        self.cls = cls    def __call__(self, *args, **kwargs):
        return self.f(self.cls, *args, **kwargs)

That object would then forward any arguments to the original function, except it’d add the class as its first argument, letting us write this elegant code:

class Point:    def __init__(self, r, t):
        self.r = r
        self.t = t    @ClassMethod
    def from_cartesian(cls, x, y):
        r = (x**2 + y**2)**0.5
        t = math.atan2(y, x)
        return cls(r, t)    @ClassMethod
    def from_polar(cls, r, t):
        return cls(r, t)

Cool, no? Very balanced, very symmetrical—and more importantly, dynamic: Point gets passed in as cls, and then invoked on r and t to construct a new instance, without us having to hardcode its name.

Python actually provides both those features via the built-in staticmethod and classmethod classes, which behave in a similar fashion. Personally, I never use the static one—it just doesn’t really make sense to add a function to a class, if that function doesn’t care and isn’t aware of the situation. It’s class methods—methods that operate on the class level—that we wanted all along; we’ve just grown used to getting a half-baked version of it and making do.

Descripting Some More

Fun fact about descriptors: they have a special __set_name__ method, which gets invoked the moment that the class within which they reside is created. As you might remember, this is how it works:

class A:
    d = D()

And while a can happily access d, that poor D object doesn’t know where it was defined—or even under what name. Except…

class D:    def __set_name__(self, cls, name):
        print(f'my name is {name}!')
        self.name = name    def __get__(self, instance, cls):
        print(f'getting {cls.__name__}.{self.name}')
        return 42

This way, the moment A is created, having d inside it, Python invokes d.__set_name__(A, 'd'), which gives it an opportunity to learn and retain its name—later using it for its own purposes, like more informative messages:

>>> class A:
...     d = D()
my name is d!>>> a = A()
>>> a.d
getting A.d
42

Even our setter from before, instead of:

if self.setter is None:
    raise AttributeValue('not settable')

Which forces us to look at the code to figure out what exactly wasn’t settable, could do better with a name:

if self.setter is None:
    raise AttributeValue(f"can't set property {self.name}")

That’s actually something I wish they’d add to standard propertys; I mean,

>>> class A:
...     @property
...     def p(self):
...         return 1>>> a = A()
>>> a.p
1
>>> a.p = 2
Traceback (most recent call last):
  ...
AttributeError: can't set attribute

… Which one??? :[

Compute Once, Use Everywhere

One advantage attributes have over properties is that they’re just a reference to a value, while properties have to run some code every single time they’re accessed. That might be negligible for converting polar to cartesian, but if you use properties carelessly, just because you think they’re cool:

class File:    def __init__(self, path):
        self.path = path    @property
    def data(self):
        return open(self.path).read()

You’ll end up with this:

>>> f = File('file.txt')
>>> f.data
'Hello, world!' # Pretty cool syntax!
>>> f.data
'Hello, world!' # Except, the file is read from disk every time

What’s worse, is that your users might be under the impression that this is just an attribute, and not associate any computational cost with it. I mean:

data = f.read_data()
if data == x:
   # Do something
if data == y:
   # Do something else

Is OK, but data = f.data? That’s just stupid. They’d probably end up with:

if f.data == x:
   # Do something
if f.data == y:
   # Do something else

Which is one line shorter, and one file read slower (or more, depending on the code). What if we could have a descriptor to implement lazy attributes; that is, actual values that get computed on their first access?

That’s actually brilliant for another reason: if your class computes some of its attributes (probably in its __init__, as it generates its initial state), and if this computation is expensive and the user ends up not using that bit—it’s wasted time, and it could be optimized away easily by only computing it on-demand. You might’ve figured out the solution already:

class File:    def __init__(self, path):
        self.path = path
        self._data = None    @property
    def data(self):
        if self._data is None:
            self._data = open(self.path).read()
        return self._data

But that’s ugly (especially if you have a lot of those). Let’s use descriptors!

class CachedProperty:    def __init__(self, f):
        self.f = f    def __get__(self, instance, cls):
        if instance is None:
            return self
        # Erm... where do we store cached values?

Remember, the same descriptor serves many instances, so if we just store it as self.value, all our cached properties will resolve to whichever of them was accessed first. We can try a dictionary, mapping something like the object ID to its value; but then, this dictionary will just grow indefinitely over time (although, you could do it with weakref.finalize or something). Instead, we can distribute it amongst the instances: every instance will keep its own value; and what better place to hold personal state than __dict__! In fact, we can even be so audacious as to store it there under our own name, since whenever someone accesses it, Python wires the call to our property object; that slot in the dictionary is just vacant. And so:

class CachedProperty:    def __init__(self, f):
        self.f = f    def __set_name__(self, cls, name):
        self.name = name    def __get__(self, instance, cls):
        if instance is None:
            return self
        if self.name not in instance.__dict__:
            instance.__dict__[self.name] = self.f(instance)
        return instance.__dict__[self.name]

Let’s see that it works:

>>> class A:
...     @CachedProperty
...     def p(self):
...         print('computing...')
...         return 1
>>> a = A()
>>> a.__dict__
{}
>>> a.p
computing...
1
>>> a.__dict__
{'p': 1}
>>> a.p
1 # No computation!

Ha. We can even clear the cache by simply deleting the attribute, which will remove the value from __dict__ and cause a re-computation:

>>> del a.p
>>> a.p
computing...
1

Now that you get the gist, I’d like to address some misconceptions and inaccuracies:

1

Q: There should be a self.name = None in the __init__; what if __get__ is called before __set_name__, and crashes when it accesses self.name?

A: That’s impossible, because __set_name__ gets invoked right when the class is defined, and to call __get__ you have to have a class; it’s effectively like asking “what if anything gets called before __init__? It won’t work, because there aren’t any attributes!”—so I’d rather keep it short and sweet.

2

Q: Your __get__ code is needlessly complicated.

A: That’s actually true. This code would be better:

class CachedProperty:
    ... # Same as before
    def __get__(self, instance, cls):
        if instance is None:
            return self
        value = self.f(instance)
        instance.__dict__[self.name] = value
        return value

But to understand it, you have to understand a weird edge-case: when a descriptor defines __get__ only, it’s called a non-data descriptor, and __dict__ actually takes precedence over it; so accessing the attribute for the first time would invoke the property, which would store the value in __dict__ and effectively overshadow itself. The next time the attribute is accessed, the value will be returned straight from __dict__, without invoking the property at all.

However, if your descriptor also defines __set__ or __delete__, it’s called a data-descriptor, and its __get__ takes precedence over __dict__, so you’d have to implement caching like before. The rationale behind this is that the instance’s state should precede everything else—even dynamically computed properties; but if these properties have assignment or deletion logic, it means they’re actually managing their state themselves, and should be left to their own devices. It’s pretty cool that even when Python has a weird edge-case, it’s actually quite elegant.

3

Q: What about thread-safety?

A: Excellent point. In general, whenever you’re about to implement something like a cached property, look around first; you’ll find cached-property pretty quickly, and it’ll probably be better than what you write yourself, having stood the test of time and gained some mileage.

Speed Typing

This is getting quite long—but I promised you typed properties, so let’s say a few words. The idea is simple—you define a property with some type, like so:

class A:
   x = TypedProperty(int)
   def __init__(self, x):
        self.x = x

And that protects x from being assigned the wrong value:

>>> a = A(1)
>>> a.x
1
>>> a.x = 'Hello, world!'
Traceback (most recent call last):
  ...
AttributeError: x must be int

You can do this one as an exercise—and then have a look at my solution:

class TypedProperty:    def __init__(self, type):
        self.type = type    def __set_name__(self, cls, name):
        self.name = name    def __get__(self, instance, cls):
        if instance is None:
            return self
        # If it wasn't assigned, resolve to a default value:
       if self.name not in instance.__dict__:
            instance.__dict__[self.name] = self.type()
       return instance.__dict__[self.name]    def __set__(self, instance, value):
        if not isinstance(value, self.type):
            raise AttributeError(f'{self.name} must be '
                                 f'{self.type.__name__}')
        instance.__dict__[self.name] = value

Conclusion

Descriptors are awesome: whether for basic stuff like methods, more advanced stuff like properties and class methods, or writing magical infrastructure like all the big boys (Django, SQLAlchemy et al). Next time, we’ll talk about two more types of object behaviour: context management, and creation. See you then!