How to patch Python bytecode

Last updated on December 10, 2017, in python

In standard Python, when executing a script, the raw source code is compiled into platform-independent bytecode which subsequently runs on Python's stack-based virtual machine.

Сode objects

Code objects represent blocks of bytecode. According to the Python documentation, there are three types of blocks (namespaces): a module, a function body, and a class definition. Such objects are produced whenever a block of Python code is compiled, e.g., at the startup or during execution time.

You can access code objects from Python code. Let's create a function to play with:

def wadd(x, y=1):
    pow_n = 3
    result = (x + y) ** pow_n
    return abs(result)

Code object lives at the __code__ dunder method. Let's explore its attributes:

>>> for attr in dir(wadd.__code__):
...     if attr.startswith('co_'):
...         print("\t%s = %s" % (attr, wadd.__code__.__getattribute__(attr)))
...
    co_argcount = 2
    co_cellvars = ()
    co_code = b'd\x01\x00}\x02\x00|\x00\x00|\x01\x00\x17|\x02\x00\x13}\x03\x00t\x00\x00|\x03\x00\x83\x01\x00S'
    co_consts = (None, 3)
    co_filename = <stdin>
    co_firstlineno = 1
    co_flags = 67
    co_freevars = ()
    co_kwonlyargcount = 0
    co_lnotab = b'\x00\x01\x06\x01\x0e\x01'
    co_name = wadd
    co_names = ('abs',)
    co_nlocals = 4
    co_stacksize = 2
    co_varnames = ('x', 'y', 'pow_n', 'result')

To get an idea about these fields, you can read the documentation of the inspect module. Most of the fields are pretty self-explanatory, except for the co_code and co_lnotab.

The co_code field contains a sequence of bytecode instructions. Each instruction occupies two bytes, one for instruction code and one for the corresponding argument, whereas the co_lnotab field contains mappings from bytecode instructions to the corresponding lines in the source code.

Bytecode disassembling

The builtin dis module comes in handy when you want to read bytecode in a human-readable format:

>>> import dis
>>> dis.dis(wadd)
  2           0 LOAD_CONST               1 (3)
              2 STORE_FAST               2 (pow_n)

  3           4 LOAD_FAST                0 (x)
              6 LOAD_FAST                1 (y)
              8 BINARY_ADD
             10 LOAD_FAST                2 (pow_n)
             12 BINARY_POWER
             14 STORE_FAST               3 (result)

  4          16 LOAD_GLOBAL              0 (abs)
             18 LOAD_FAST                3 (result)
             20 CALL_FUNCTION            1
             22 RETURN_VALUE

The first number is the corresponding line number in the source code (thanks to co_lnotab). The next blocks contain three columns: an offset of the instruction in the bytecode, instruction name and an argument with a human-readable representation in parentheses (if any).

A complete list of CPython's instructions can be found here. The actual implementation of each instruction is located in ceval.c file.

Bytecode patching

Imagine, you have a bug in someone else's module, and you can't edit module's files. One of the solutions is to patch bytecode at runtime!

All code objects are immutable, so we need to create a new one. For example, let's replace the add operator in our function:

from types import CodeType

def fix_function(func, payload):
    fn_code = func.__code__
    func.__code__ = CodeType(fn_code.co_argcount,
                             fn_code.co_kwonlyargcount,
                             fn_code.co_nlocals,
                             fn_code.co_stacksize,
                             fn_code.co_flags,
                             payload,
                             fn_code.co_consts,
                             fn_code.co_names,
                             fn_code.co_varnames,
                             fn_code.co_filename,
                             fn_code.co_name,
                             fn_code.co_firstlineno,
                             fn_code.co_lnotab,
                             fn_code.co_freevars,
                             fn_code.co_cellvars,
                             )

payload = wadd.__code__.co_code

# replace BINARY_ADD (0x17) at position #12 with BINARY_SUBTRACT (0x18)
subtract_opcode = dis.opmap['BINARY_SUBTRACT'].to_bytes(1, byteorder='little')
payload = payload[0:12] + subtract_opcode + payload[13:]

wadd(3, 1)  # The result is: 64
# Now it's (x - y) instead of (x+y)
fix_function(wadd, payload)
wadd(3, 1)  # The result is: 8

Moreover, you can change other fields too. For example, you can edit constant variables, arguments, replace globals with locals. You can even create new statement.

To simplify the process of editing bytecode you can use special modules, such as bytecode and codetransformer.

If you have any questions, feel free to ask them via e-mail displayed in the footer.

python , cpython internals, advanced python

Pavel Karateev 2018-09-06 #
Thank you, it was tremendously useful.
reply

How to patch Python bytecode

Сode objects

Bytecode disassembling

Bytecode patching

Recent posts in Python category

October 21, 2020

On code isolation in Python

August 24, 2020

How to turn an ordinary gzip archive into a database

April 28, 2019

Detecting SQL injections in Python code using AST

August 09, 2018

How Python saves memory when storing strings

June 29, 2018

How virtual environment libraries work in Python

Comments