# Python internals: Arbitrary-precision integer implementation

Have you ever noticed that Python supports integers of any size? Here is a quick glance at it.

Python represents all objects by C structures. The following data structure is responsible for all integer objects:

struct _longobject { PyObject_VAR_HEAD digit ob_digit[1]; } PyLongObject;

After macros expansions, a simplified version of the struct looks as follows:

struct { ssize_t ob_refcnt; struct _typeobject *ob_type; ssize_t ob_size; uint32_t ob_digit[1]; };

The `ob_refcnt`

field is responsible for reference counting technique which is used in garbage collection mechanism, whereas `ob_type`

is a pointer to a structure which describes an integer type.

Generally, In languages like C/C++, the precision of integers is limited to 64-bit, but Python has built-in support for Arbitrary-precision integers. Since Python 3 there is no longer simple integer type, and all integers are represented as a bignum.

## How to store arbitrarily large integer?

One of the solutions is to represent the integer as an array of digits. To do it in an efficient way we need to convert our number from base 10 to base $2^{30}$ numeral system, where each element represents a single 'digit' in a range from $0$ to $2^{30}-1$. Depending on the platform, Python uses either 32-bit unsigned integer arrays with 30-bit digits or 16-bit unsigned integer arrays with 15-bit digits. Using such approach introduces additional requirements, that's why we can't use all bits of the integer. The `ob_digit`

field in a structure above is responsible for such arrays.

To eliminate unnecessary computation CPython has a "fast path" implementation for integers in a range of $-2^{30}$ to $2^{30}$. Such integers are stored as an array of one element and if it's possible treated as fixed 32-bit integers.

Note that, unlike classical approach, the sign of an integer is stored separately in `ob_size`

field. This field stores the size of the `ob_digit`

array. So if you want to change the sign of an array of size 2 you need to set `ob_size`

to -2.

The comment from the source code describes it as follows:

/* Long integer representation. The absolute value of a number is equal to SUM(for i=0 through abs(ob_size)-1) ob_digit[i] * 2**(SHIFT*i) Negative numbers are represented with ob_size < 0; zero is represented by ob_size == 0. In a normalized number, ob_digit[abs(ob_size)-1] (the most significant digit) is never zero. Also, in all cases, for all valid i, 0 <= ob_digit[i] <= MASK. The allocation function takes care of allocating extra memory so that ob_digit[0] ... ob_digit[abs(ob_size)-1] are actually available. CAUTION: Generic code manipulating subtypes of PyVarObject has to aware that integers abuse ob_size's sign bit. */

The representation of 123456789101112131415 will look as follows:

ob_size | 3 | ||

ob_digit | 437976919 | 87719511 | 107 |

Algorithm of converting our representation back:

## Optimization of commonly-used integers

Small integer objects in a range of -5 to 256 are always pre-allocated during initialization. Because Python integers are immutable, we can use them as singletons. Every time you need to create small integer instead of creating new object Python just points to already allocated one. Thereby it saves a lot of space and computation for commonly-used integers.

Interestingly enough, the `PyLongObject`

structure takes at least 28 bytes for every allocated integer and therefore takes three times as much memory as a simple 64-bit C integer.

## Number system conversion

Let's see how Python converts integers to an array.

We are going to look at the simple example where we need to convert a long unsigned 64-bit integer to Python's integer representation. Note that it's the biggest possible standard C type that can be converted to our `PyLongObject`

directly. Larger numbers can be converted from strings or byte arrays.

If you are not familiar with number system conversion, you can read this tutorial.

Here is a simplified algorithm translated from C to Python:

SHIFT = 30 # number of bits for each 'digit' MASK = (2 ** SHIFT) bignum = 18446744073709551615 def split_number(bignum): t = abs(bignum) num_list = [] while t != 0: # Get remainder from division small_int = t % MASK # more efficient bitwise analogue: (t & (MASK-1)) num_list.append(small_int) # Get integral part of the division (floor division) t = t // MASK # more efficient bitwise analogue: t >>= SHIFT return num_list def restore_number(num_list): bignum = 0 for i, n in enumerate(num_list): bignum += n * (2 ** (SHIFT * i)) return bignum num_list = split_number(bignum) assert bignum == restore_number(num_list)

To verify our solution we can also take a look at the internal representation:

import ctypes class PyLongObject(ctypes.Structure): _fields_ = [("ob_refcnt", ctypes.c_long), ("ob_type", ctypes.c_void_p), ("ob_size", ctypes.c_ulong), ("ob_digit", ctypes.c_uint * 3)] bignum = 18446744073709551615 for d in PyLongObject.from_address(id(bignum)).ob_digit: print(d)

## Arithmetic

Basic arithmetic operations are implemented using simple school math with only one difference — every element of an array is treated like a single "digit".

Every operation results in a minimum one object creation. For example, when you want to execute `c += 10`

roughly these steps are performed:

- Get an address of an already allocated object for
`10`

(because it's a small number we don't have to create a new object, see above). - Create a new integer object so we can store the result of addition in it.
- Add
`c`

and`10`

together. Store the result in the newly created object. - Replace variable
`c`

with reference to our new object. - Decrease the reference count of an old
`c`

so garbage collector can destroy this object in the future.

For example, let's take a look at addition, where carrying is used:

def add_bignum(a, b): z = [] if len(a) < len(b): # Ensure a is the larger of the two a, b = b, a carry = 0 for i in range(0, len(b)): carry += a[i] + b[i] z.append(carry % MASK) carry = carry // MASK for i in range(i + 1, len(a)): carry += a[i] z.append(carry % MASK) carry = carry // MASK z.append(carry) # remove trailing zeros i = len(z) while i > 0 and z[i-1] == 0: i -= 1 z = z[0:i] return z a = 8223372036854775807 b = 100037203685477 assert restore_number(add_bignum(split_number(a), split_number(b))) == a + b

## Diving deeper

There are a lot of small details that can't be described in one article. You can read more about integers in CPython (1, 2, 3) source code.

I have plans to continue this series, in the next posts I will describe internals of class and function objects.

Want a monthly digest of these blog posts?

# Comments

- Artem 2 years, 3 months ago (from disqus) #
Thanks, I've fixed the bug. Original implementation uses subtraction function for negative addition.

- mingrammer 2 years, 3 months ago (from disqus) #
Thanks for great article.

So, I wanna translate it into korean on my personal blog

Can I do it? :) (Surely, I will link this original post)

Thanks

- Artem 2 years, 3 months ago (from disqus) #
Yes, you can. It would be cool. Just mention original post in your article.

- motleytech 2 years, 3 months ago (from disqus) #
Good article.

It might be interesting to note the Karatsuba multiplication method which is implemented for these integers. Its much faster than naive multiplication (when dealing with large numbers).

- Régis 2 years, 3 months ago (from disqus) #
There's a small typo: "So if you want to change the sign of an array of size 2 you need to set ob_digit to -2" should read "So if you want to change the sign of an array of size 2 you need to set ob_size to -2"

- Влад Токарев 1 year, 6 months ago (from disqus) #
Great! It's very useful kind of articles that about internals of some software... Thank you! Write else! :-)

- debashish palit 1 year, 5 months ago (from disqus) #
Nice article. One thing I didn't get is the "remove trailing zeros" bit in

`add_bignum`

function. Going by your Python code,`carry`

can be either 0 or 1. So, only one trailing zero may be present. So, is the`while`

loop needed?- Artem 1 year, 5 months ago (from disqus) #
Good catch! In this case yes, it's not needed. Original C code uses a helper function with a while loop, but it's also used in other functions where more than 1 zeros can happen.

- roachsinai 9 months, 2 weeks ago (from disqus) #
Thanks for this very helpful post! But could you tell me more detail about "To eliminate unnecessary computation CPython has a "fast path" implementation for integers in a range of 2^(-30) - 2^30? In my mind python only story integer -5~257.

I found a bug in the last line of function

`add_bignum`

. The final assert fails for example for numbers`a=2**30 - 1; b=1`

. It should also be noted that this function has been simplified to work only for non-negative numbers.