Understanding internals of Python classes
The goal of this series is to describe internals and general concepts behind the class
object in Python 3.6. In this part, I will explain how Python stores and lookups attributes. I assume that you already have a basic understanding of object-oriented concepts in Python.
Let's start with a simple class:
class Vehicle:
kind = 'car'
def __init__(self, manufacturer, model):
self.manufacturer = manufacturer
self.model_name = model
@property
def name(self):
return "%s %s" % (self.manufacturer, self.model_name)
def __repr__(self):
return "<%s>" % self.name
car = Vehicle('Toyota', 'Corolla')
print(car, car.kind)
Here the Vehicle
is a class, and the car
is an instance of the class.
The dot notation (e.g. car.kind
) is called an attribute reference, which usually points either to a variable or a method (function).
Instance and class variables
The model_name
is called an instance variable, which value belongs to an instance. On the other hand, the kind
is a class variable, which owner is a class.
It is important to understand the difference between them. Changing class variables affect all instances.
>>> car = Vehicle('Toyota', 'Corolla')
>>> car2 = Vehicle('Honda', 'Civic')
>>> car.kind, car2.kind
('car', 'car')
>>> Vehicle.kind = 'scrap'
>>> car.kind, car2.kind
('scrap', 'scrap')
What happens when you change a class variable from the instance?
>>> car = Vehicle('Toyota', 'Corolla')
>>> car2 = Vehicle('Honda', 'Civic')
>>> car.kind, car2.kind
('car', 'car')
>>> car.kind = 'scrap'
>>> car.kind, car2.kind
('scrap', 'car')
As you can see, the value of the kind
variable changes only for one instance. How is it possible?
Instead of changing a class variable Python creates a new instance variable with the same name. Hence, the instance variables have precedence over class variables when searching for an attribute value.
Mutable class variables
You need to be very careful when working with mutable class variables (e.g., list, set, dictionary). Unlike immutable types, you can change them from an instance.
>>> class Test:
... lst = [1,]
...
>>> t1 = Test()
>>> t2 = Test()
>>> t1.lst, t2.lst
([1], [1])
>>> t1.lst.append(2)
>>> t1.lst, t2.lst
([1, 2], [1, 2])
The rule of thumb here is to avoid class variables unless you have a reason to use them.
How Python stores instance attributes
In Python, all instance variables are stored as a regular dictionary. When working with attributes, you just changing a dictionary.
We can access instance dictionary by calling __dict__
dunder (magic) method:
>>> car.__dict__
{'manufacturer': 'Toyota', 'model_name': 'Corolla'}
By knowing this detail, we can save and later restore the state of an arbitrary class:
>>> def from_dict(dict):
... instance = Vehicle.__new__(Vehicle)
... instance.__dict__.update(dict)
... return instance
...
>>> car = Vehicle('Toyota', 'Corolla')
>>> car
<Toyota Corolla>
>>> # Save class dict
... class_state = car.__dict__.copy()
>>>
>>> # Delete an instance
... del car
>>> # Restore instance from the dict
... car = from_dict(class_state)
>>> car
<Toyota Corolla>
How Python stores class attributes
As I said earlier, class attributes are owned by a class itself (i.e., by its definition). As it turns out, classes are using a dictionary too.
>>> Vehicle.__dict__
mappingproxy({'__module__': 'main', 'kind': 'car',
'lst': [1], '__init__': <function Vehicle.__init__ at 0x109228488>,
'name': <property object at 0x1091f8098>,
'__repr__': <function Vehicle.__repr__ at 0x109228598>,
'from_dict': <staticmethod object at 0x1091f6d30>,
'__dict__': <attribute '__dict__' of 'Vehicle' objects>,
'__weakref__': <attribute '__weakref__' of 'Vehicle' objects>, '__doc__': None})
Class dictionary can also be accessed from an instance, using __class__
dunder method (i.e., car.__class__.__dict__
).
Dictionaries of classes are protected by mappingproxy
. The proxy checks that all attribute names are strings, which helps to speed-up attribute lookups. As a downside, it makes dictionary read-only.
Because all methods belong to a class, they are also stored in this dictionary.
Functions and methods
As you may know, a method is a function that belongs to a specific class. In Python 2, there were two kinds of methods: unbound and bound. Python 3 has only latter.
Bound methods are associated with data of the instance they bound to:
>>> class Test:
... def square(self, x):
... return x ** 2
...
>>> t = Test()
>>> t.square
<bound method Test.square of <__main__.Test object at 0x10cdfdc50>>
>>> Test.square
<function Test.square at 0x10cd20c80>
>>> Test.__dict__['square']
<function Test.square at 0x10cd20c80>
We can access an instance from a bound method:
>>> bound_square = t.square
>>> bound_square(10)
100
>>> bound_square.__self__
<__main__.Test object at 0x10cdfdc50>
Class dictionary stores functions, which become methods when they are accessed by attribute syntax (dot notation). With the help of descriptor protocol, every function has a __get__
method, which bounds function to an object.
Manual function bounding:
>>> t = Test()
>>> def square(self, x):
... return x ** 2
...
>>> print(square)
<function square at 0x109228730>
>>> bound_square = square.__get__(t,Test)
>>> print(bound_square)
<bound method square of <__main__.Test object at 0x10cdfdcf8>>
>>> print(bound_square(10))
100
>>> bound_square.__self__
<__main__.Test object at 0x10cdfdcf8>
As a result, bound method omits first argument (i.e. self
) of a function.
There is a well-written explanation of how it is work in Python's documentation: Functions and Methods and Method Objects.
Inheritance and attribute lookup order
Now you know that all variables and methods are stored in two dictionaries. It is time to understand how Python performs attribute lookup in case of inheritance.
Since every Python class implicitly inherits from object
, there is always one level of inheritance.
>>> class A:
... pass
...
>>>
>>> class B:
... pass
...
>>>
>>> class C(A, B):
... pass
...
>>> C.mro()
[<class '__main__.C'>, <class '__main__.A'>, <class '__main__.B'>, <class 'object'>]
The mro
(Method Resolution Order) is a special method, which returns linearized order of classes.
To fully understand lookup order you need be familiar with Descriptor Protocol. But basically, the are two types of descriptors:
If an object defines both
__get__()
and__set__()
, it is considered a data descriptor. Descriptors that only define__get__()
are called non-data descriptors (they are typically used for methods but other uses are possible).
Thus, because functions only implement __get__
, they are called non-data descriptors.
Python uses the following order:
- Data descriptors from class dictionary and its parents
- Instance dictionary
- Non-data descriptors from class dictionary and its parents
Keep in mind, that no matter how many levels of inheritance you have there is always one instance dictionary which stores all instance variables.
Pseudo-code of attribute lookup:
def get_attribute(obj, index):
class_definition = obj.__class__
descriptor = None
for cls in class_definition.mro():
if index in cls.__dict__:
descriptor = cls.__dict__[index]
break
if hasattr(descriptor, '__set__'):
# Data descriptor, return object
return descriptor, 'data descriptor'
# Check class instance
if index in obj.__dict__:
return obj.__dict__[index], 'instance attribute'
if descriptor is not None:
# Non-data descriptor
return descriptor, 'non-data descriptor'
else:
raise AttributeError
>>> car = Vehicle('Toyota', 'Corolla')
>>> get_attribute(car, 'kind')
('car', 'non-data descriptor')
>>> get_attribute(car, 'name')
(<property object at 0x108815098>, 'data descriptor')
>>> get_attribute(car, 'model_name')
('Corolla', 'instance attribute')
>>> get_attribute(car, '__repr__')
(<function Vehicle.__repr__ at 0x10883e598>, 'non-data descriptor')
__slots__
When dealing with thousands of instances, memory consumption can be a problem. Because of the underlying implementation of a hash table, creating a dictionary for each instance takes a lot of memory. Hopefully, Python provides a way to disable per-instance dictionary by defining __slots__
attribute.
Here is how slots are usully defined:
class Car:
__slots__ = 'model_name', 'manufacturer'
def __init__(self, manufacturer, model):
self.manufacturer = manufacturer
self.model_name = model
>>> c = Car('Toyota', 'Corolla')
>>> c.manufacturer
'Toyota'
>>> c.__dict__
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'Car' object has no attribute '__dict__'
When you define slots, instead of creating a dictionary for each instance, attribute values are stored in a list. In turn, attributes names are moved to a class dictionary.
>>> Car.__dict__
mappingproxy({'__module__': 'slots', '__slots__': ('model_name', 'manufacturer'),
'__init__': <function Car.__init__ at 0x1015e8d90>, 'manufacturer': <member 'manufacturer' of 'Car' objects>,
'model_name': <member 'model_name' of 'Car' objects>, '__doc__': None})
On the class-level, each slot has as a descriptor that knows its unique position in the instance list. There is a good explanation of how it works by Raymond Hettinger. Although it was written 10 years ago, the concept stays the same.
Bonus: function attributes
Python's dictionary is so fundamental to Python, that many other objects using it too. Since Python 2.1, functions can have arbitrary attributes, that is, you can use a function as key-value storage.
def func():
print(func.a)
func.a -= 10
func.a = 10
func.foo = "bar"
func()
func.a += 2
func()
print(func.__dict__)
Internally, it's just a dictionary that handles failed attribute lookups (i.e., nondefault attributes). You can access or even replace such dictionary using already familiar __dict__
attribute. The PEP 232 has an extensive description of this feature.
For example, you can track the number of times a function was called:
def func(a, b):
func.ncalls += 1
return a + b
func.ncalls = 0
func(1, 2)
func(3, 2)
print(func.ncalls)
The explanation of attributes got a lot longer than I expected and I will split the article about internals into a series of posts. If you don't want to miss the follow-up on this topic, you can subscribe to my RSS.
Comments
- Aidas Bendoraitis 2018-01-29 #
Very detailed and comprehensive article. Thanks!
I would just like to have information about
__slots__
included.And I would rename the
from_dict(dict)
tofrom_dict(dictionary)
, becausedict
is a reserved keyword for the dictionary type.- Artem 2018-01-29 #
Next article starts with slots. I will finish it soon.
Good catch about dict!
upd: Actually, I changed my mind and added description about slots to this article.
- Tony Su 2019-12-18 #
I happened to read this article and it really solved many puzzles in my mind of how Python does this and that under the table and why.
Really appreciated it!!!
- Sam B 2020-03-29 #
Great article and one of the better articles that describes classes and init I am new to python and one thing I am still unclear about is why should I use classes when I can use a dictionary? They both can store large datasets. And how often are classes used versus dictionaries? Of everything that I have read (keep in mind, at the beginner) level, dictionaries seem to be preferred.
- Artem 2020-04-03 #
I always prefer dictionaries. I don't need classes to just store some data, but you can use them to add extra input validation for each field.
- Dan 2022-11-13 #
Hi. Could you please help me understand the following?
print(isinstance(object,type)) #True
print(issubclass(object,type)) #False
print(isinstance(type,object)) #True
print(issubclass(type, object)) #True
How is it possible for a superclass (type Class from which object Class was created) to inherit from it's subclass (object) ?
print(object.class) #type
print(object.base) #None
print(type(object)) #type
print(type.class) #type
print(type.base) #object
print(type(type)) #type
Thank you!
- Artem 2022-11-19 #
Both type and object are special classes. I guess you can think of object as a specially crafted featureless type class. For example, you can't assign new attributes to the object instances, so it's not your usual class.
double -> square
;)