Understanding internals of Python classes

Last updated on February 09, 2018, in Python

The goal of this series is to describe internals and general concepts behind the class object in Python 3.6. In this part, I will explain how Python stores and lookups attributes. I assume that you already have a basic understanding of object-oriented concepts in Python.

Let's start with a simple class:

class Vehicle:
    kind = 'car'

    def __init__(self, manufacturer, model):
        self.manufacturer = manufacturer
        self.model_name = model

    @property
    def name(self):
        return "%s %s" % (self.manufacturer, self.model_name)

    def __repr__(self):
        return "<%s>" % self.name

car = Vehicle('Toyota', 'Corolla')
print(car, car.kind)

Here the Vehicle is a class, and the car is an instance of the class.

The dot notation (e.g. car.kind) is called an attribute reference, which usually points either to a variable or a method (function).

Instance and class variables

The model_name is called an instance variable, which value belongs to an instance. On the other hand, the kind is a class variable, which owner is a class.

It is important to understand the difference between them. Changing class variables affect all instances.

>>> car = Vehicle('Toyota', 'Corolla')
>>> car2 = Vehicle('Honda', 'Civic')
>>> car.kind, car2.kind
('car', 'car')
>>> Vehicle.kind = 'scrap'
>>> car.kind, car2.kind
('scrap', 'scrap')

What happens when you change a class variable from the instance?

>>> car = Vehicle('Toyota', 'Corolla')
>>> car2 = Vehicle('Honda', 'Civic')
>>> car.kind, car2.kind
('car', 'car')
>>> car.kind = 'scrap'
>>> car.kind, car2.kind
('scrap', 'car')

As you can see, the value of the kind variable changes only for one instance. How is it possible?

Instead of changing a class variable Python creates a new instance variable with the same name. Hence, the instance variables have precedence over class variables when searching for an attribute value.

Mutable class variables

You need to be very careful when working with mutable class variables (e.g., list, set, dictionary). Unlike immutable types, you can change them from an instance.

>>> class Test:
...     lst = [1,]
...
>>> t1 = Test()
>>> t2 = Test()
>>> t1.lst, t2.lst
([1], [1])
>>> t1.lst.append(2)
>>> t1.lst, t2.lst
([1, 2], [1, 2])

The rule of thumb here is to avoid class variables unless you have a reason to use them.

How Python stores instance attributes

In Python, all instance variables are stored as a regular dictionary. When working with attributes, you just changing a dictionary.

We can access instance dictionary by calling __dict__ dunder (magic) method:

>>> car.__dict__
{'manufacturer': 'Toyota', 'model_name': 'Corolla'}

By knowing this detail, we can save and later restore the state of an arbitrary class:

>>> def from_dict(dict):
...     instance = Vehicle.__new__(Vehicle)
...     instance.__dict__.update(dict)
...     return instance
...
>>> car = Vehicle('Toyota', 'Corolla')
>>> car
<Toyota Corolla>
>>> # Save class dict
... class_state = car.__dict__.copy()
>>>
>>> # Delete an instance
... del car
>>> # Restore instance from the dict
... car = from_dict(class_state)
>>> car
<Toyota Corolla>

How Python stores class attributes

As I said earlier, class attributes are owned by a class itself (i.e., by its definition). As it turns out, classes are using a dictionary too.

>>> Vehicle.__dict__
mappingproxy({'__module__': 'main', 'kind': 'car',
'lst': [1], '__init__': <function Vehicle.__init__ at 0x109228488>,
'name': <property object at 0x1091f8098>,
'__repr__': <function Vehicle.__repr__ at 0x109228598>,
'from_dict': <staticmethod object at 0x1091f6d30>, 
'__dict__': <attribute '__dict__' of 'Vehicle' objects>,
'__weakref__': <attribute '__weakref__' of 'Vehicle' objects>, '__doc__': None})

Class dictionary can also be accessed from an instance, using __class__ dunder method (i.e., car.__class__.__dict__).

Dictionaries of classes are protected by mappingproxy. The proxy checks that all attribute names are strings, which helps to speed-up attribute lookups. As a downside, it makes dictionary read-only.

Because all methods belong to a class, they are also stored in this dictionary.

Functions and methods

As you may know, a method is a function that belongs to a specific class. In Python 2, there were two kinds of methods: unbound and bound. Python 3 has only latter.

Bound methods are associated with data of the instance they bound to:

>>> class Test:
...     def square(self, x):
...         return x ** 2
...
>>> t = Test()
>>> t.square
<bound method Test.square of <__main__.Test object at 0x10cdfdc50>>
>>> Test.square
<function Test.square at 0x10cd20c80>
>>> Test.__dict__['square']
<function Test.square at 0x10cd20c80>

We can access an instance from a bound method:

>>> bound_square = t.square
>>> bound_square(10)
100
>>> bound_square.__self__
<__main__.Test object at 0x10cdfdc50>

Class dictionary stores functions, which become methods when they are accessed by attribute syntax (dot notation). With the help of descriptor protocol, every function has a __get__ method, which bounds function to an object.

Manual function bounding:

>>> t = Test()
>>> def square(self, x):
...     return x ** 2
...
>>> print(square)
<function square at 0x109228730>
>>> bound_square = square.__get__(t,Test)
>>> print(bound_square)
<bound method square of <__main__.Test object at 0x10cdfdcf8>>
>>> print(bound_square(10))
100
>>> bound_square.__self__
<__main__.Test object at 0x10cdfdcf8>

As a result, bound method omits first argument (i.e. self) of a function.

There is a well-written explanation of how it is work in Python's documentation: Functions and Methods and Method Objects.

Inheritance and attribute lookup order

Now you know that all variables and methods are stored in two dictionaries. It is time to understand how Python performs attribute lookup in case of inheritance.

Since every Python class implicitly inherits from object, there is always one level of inheritance.

>>> class A:
...     pass
...
>>>
>>> class B:
...     pass
...
>>>
>>> class C(A, B):
...     pass
...
>>> C.mro()
[<class '__main__.C'>, <class '__main__.A'>, <class '__main__.B'>, <class 'object'>]

The mro (Method Resolution Order) is a special method, which returns linearized order of classes.

To fully understand lookup order you need be familiar with Descriptor Protocol. But basically, the are two types of descriptors:

If an object defines both __get__() and __set__(), it is considered a data descriptor. Descriptors that only define __get__() are called non-data descriptors (they are typically used for methods but other uses are possible).

Thus, because functions only implement __get__, they are called non-data descriptors.

Python uses the following order:

  • Data descriptors from class dictionary and its parents
  • Instance dictionary
  • Non-data descriptors from class dictionary and its parents

Keep in mind, that no matter how many levels of inheritance you have there is always one instance dictionary which stores all instance variables.

Pseudo-code of attribute lookup:

def get_attribute(obj, index):
    class_definition = obj.__class__

    descriptor = None
    for cls in class_definition.mro():
        if index in cls.__dict__:
            descriptor = cls.__dict__[index]
            break

    if hasattr(descriptor, '__set__'):
        # Data descriptor, return object
        return descriptor, 'data descriptor'

    # Check class instance
    if index in obj.__dict__:
        return obj.__dict__[index], 'instance attribute'

    if descriptor is not None:
        # Non-data descriptor
        return descriptor, 'non-data descriptor'
    else:
        raise AttributeError
>>> car = Vehicle('Toyota', 'Corolla')
>>> get_attribute(car, 'kind')
('car', 'non-data descriptor')
>>> get_attribute(car, 'name')
(<property object at 0x108815098>, 'data descriptor')
>>> get_attribute(car, 'model_name')
('Corolla', 'instance attribute')
>>> get_attribute(car, '__repr__')
(<function Vehicle.__repr__ at 0x10883e598>, 'non-data descriptor')

__slots__

When dealing with thousands of instances, memory consumption can be a problem. Because of the underlying implementation of a hash table, creating a dictionary for each instance takes a lot of memory. Hopefully, Python provides a way to disable per-instance dictionary by defining __slots__ attribute.

Here is how slots are usully defined:

class Car:
    __slots__ = 'model_name', 'manufacturer'

    def __init__(self, manufacturer, model):
        self.manufacturer = manufacturer
        self.model_name = model

>>> c = Car('Toyota', 'Corolla')
>>> c.manufacturer
'Toyota'
>>> c.__dict__
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'Car' object has no attribute '__dict__'

When you define slots, instead of creating a dictionary for each instance, attribute values are stored in a list. In turn, attributes names are moved to a class dictionary.

>>> Car.__dict__
mappingproxy({'__module__': 'slots', '__slots__': ('model_name', 'manufacturer'),
 '__init__': <function Car.__init__ at 0x1015e8d90>, 'manufacturer': <member 'manufacturer' of 'Car' objects>,
 'model_name': <member 'model_name' of 'Car' objects>, '__doc__': None})

On the class-level, each slot has as a descriptor that knows its unique position in the instance list. There is a good explanation of how it works by Raymond Hettinger. Although it was written 10 years ago, the concept stays the same.

Bonus: function attributes

Python's dictionary is so fundamental to Python, that many other objects using it too. Since Python 2.1, functions can have arbitrary attributes, that is, you can use a function as key-value storage.

def func():
    print(func.a)
    func.a -= 10


func.a = 10
func.foo = "bar"
func()
func.a += 2
func()
print(func.__dict__)

Internally, it's just a dictionary that handles failed attribute lookups (i.e., nondefault attributes). You can access or even replace such dictionary using already familiar __dict__ attribute. The PEP 232 has an extensive description of this feature.

For example, you can track the number of times a function was called:

def func(a, b):
    func.ncalls += 1
    return a + b


func.ncalls = 0

func(1, 2)
func(3, 2)
print(func.ncalls)


The explanation of attributes got a lot longer than I expected and I will split the article about internals into a series of posts. If you don't want to miss the follow-up on this topic, you can subscribe to my RSS.