How to patch Python bytecode
In standard Python, when executing a script, the raw source code is compiled into platform-independent bytecode which subsequently runs on Python's stack-based virtual machine.
Code objects represent blocks of bytecode. According to the Python documentation, there are three types of blocks (namespaces): a module, a function body, and a class definition. Such objects are produced whenever a block of Python code is compiled, e.g., at the startup or during execution time.
You can access code objects from Python code. Let's create a function to play with:
def wadd(x, y=1): pow_n = 3 result = (x + y) ** pow_n return abs(result)
Code object lives at the
__code__ dunder method. Let's explore its attributes:
>>> for attr in dir(wadd.__code__): ... if attr.startswith('co_'): ... print("\t%s = %s" % (attr, wadd.__code__.__getattribute__(attr))) ... co_argcount = 2 co_cellvars = () co_code = b'd\x01\x00}\x02\x00|\x00\x00|\x01\x00\x17|\x02\x00\x13}\x03\x00t\x00\x00|\x03\x00\x83\x01\x00S' co_consts = (None, 3) co_filename = <stdin> co_firstlineno = 1 co_flags = 67 co_freevars = () co_kwonlyargcount = 0 co_lnotab = b'\x00\x01\x06\x01\x0e\x01' co_name = wadd co_names = ('abs',) co_nlocals = 4 co_stacksize = 2 co_varnames = ('x', 'y', 'pow_n', 'result')
To get an idea about these fields, you can read the documentation of the inspect module. Most of the fields are pretty self-explanatory, except for the
co_code field contains a sequence of bytecode instructions. Each instruction occupies two bytes, one for instruction code and one for the corresponding argument, whereas the
co_lnotab field contains mappings from bytecode instructions to the corresponding lines in the source code.
dis module comes in handy when you want to read bytecode in a human-readable format:
>>> import dis >>> dis.dis(wadd) 2 0 LOAD_CONST 1 (3) 2 STORE_FAST 2 (pow_n) 3 4 LOAD_FAST 0 (x) 6 LOAD_FAST 1 (y) 8 BINARY_ADD 10 LOAD_FAST 2 (pow_n) 12 BINARY_POWER 14 STORE_FAST 3 (result) 4 16 LOAD_GLOBAL 0 (abs) 18 LOAD_FAST 3 (result) 20 CALL_FUNCTION 1 22 RETURN_VALUE
The first number is the corresponding line number in the source code (thanks to
co_lnotab). The next blocks contain three columns: an offset of the instruction in the bytecode, instruction name and an argument with a human-readable representation in parentheses (if any).
A complete list of CPython's instructions can be found here. The actual implementation of each instruction is located in ceval.c file.
Imagine, you have a bug in someone else's module, and you can't edit module's files. One of the solutions is to patch bytecode at runtime!
All code objects are immutable, so we need to create a new one. For example, let's replace the add operator in our function:
from types import CodeType def fix_function(func, payload): fn_code = func.__code__ func.__code__ = CodeType(fn_code.co_argcount, fn_code.co_kwonlyargcount, fn_code.co_nlocals, fn_code.co_stacksize, fn_code.co_flags, payload, fn_code.co_consts, fn_code.co_names, fn_code.co_varnames, fn_code.co_filename, fn_code.co_name, fn_code.co_firstlineno, fn_code.co_lnotab, fn_code.co_freevars, fn_code.co_cellvars, ) payload = wadd.__code__.co_code # replace BINARY_ADD (0x17) at position #12 with BINARY_SUBTRACT (0x18) subtract_opcode = dis.opmap['BINARY_SUBTRACT'].to_bytes(1, byteorder='little') payload = payload[0:12] + subtract_opcode + payload[13:] wadd(3, 1) # The result is: 64 # Now it's (x - y) instead of (x+y) fix_function(wadd, payload) wadd(3, 1) # The result is: 8
Moreover, you can change other fields too. For example, you can edit constant variables, arguments, replace globals with locals. You can even create new statement.
To simplify the process of editing bytecode you can use special modules, such as bytecode and codetransformer.
Thank you, it was tremendously useful.