Example walk through
Let us take a look at how a simple program is executed in Python world.
Save this program in a file named
test.py and execute using Python and you should see below output.
python test.py 3
Now, in order to understand what happened in the background (the Purple box) let us take a first step to disassemble this code. Since the source code file you wrote (
test.py) has been compiled and created a bytecode - let us take a look at the same. Before we actually take a look at the bytecode, in your console start the python interpreter, execute following commands.
What happened here is…
- We compiled
test.pyand stored this object in a variable
- When we print
c, it gives the output which indicates that it is a code object and the first line starts at a particular memory location.
- This code object
chas many properties. If you want to take a look at all the properties available you can do so by executing
dir(c)in python interpreter. One such property is
co_code. We printed the same in this step and it gave us a string of Hex-looking characters separated by
\. This is technically the real bytecode which is then fed to the actual interpreter.
- In the next step we check the length of this bytecode.
- Here we check the type of
co_codeand it clearly said ‘bytes’.
- This command is used to get the actual ASCII character codes from bytecodes. As we can see we now have bunch of ASCII codes in the form of an array. I know what you are thinking here, for now just hold this thought here and just assume that this is what is executed by the compiler.
So we have travelled from source file
test.py which contained the python code to the bytecode. We then represented it in ASCII format. Let’s disassemble this code now. Python source code provides us with a dissassembler and can be found in its library.
Lib > dis.py- feel free to take some time to go through it and maybe connect it back to the below screenshot.
Alright, very first impression - this looks like assembly code. It is kind of an assembly code for Python interpreter but it is more human readable. Our program had 4 lines of code - these four lines are converted by the compiler into these many lines of bytecode. The very first column starting from left represents the corresponding number of line of code in your source code file. As we can see, first line
x = 1 is converted into 2 lines of bytecode. And similarly for rest of the 3 lines. First column, helps correlate your source code to bytecode.
The second column represents the byte offset for every instruction. It represents the bytes occupied by that corresponding instruction. It starts with 0 for first instruction which is 2 bytes long, thus the next instruction starts at 2 and so on. If you look at this value for the last instruction (
RETURN_VALUE) it starts at 26 which means that the end would be at #28. This corresponds to the step# 4 in previous section where we took a look at
co_code after compilation. In this step when we checked the output of the compiled code - to correctly said 28.
The third column represents the actual instruction. The program source we wrote makes use of instructions like -
LOAD_CONST, STORE_NAME, LOAD_NAME, BINARY_ADD, CALL_FUNCTION, POP_TOP, RETURN_VALUE. These are the instructions which actually tell the Python interpreter what to do in the next step. The interpreter is really simple. It only executes the current step. It doesn’t know anything about what is coming next. However, we shall get into the details of this a bit later. Remember I asked you to hold your thoughts at ASCII codes topic? - now that you know these instructions, I would suggest this is the right time to take a look at Python source code file “Include > opcode.h”. If you can’t go through it right now, here’s a snippet of how it looks. As you can see, there are some familiar opcodes which we just came across. For example
LOAD_CONST is associated with ASCII Character
100. Spend a few minutes and let the circle complete here for you.
The fourth column represents the pointer to
value stack. Python interpreter maintains something called as value stack and we shall know more about it in upcoming sections where we discuss frames and scopes. Same applies to the values which you can see in the brackets in fifth column. Just to give you an overview, the values in the brackets represent what the Interpreter is dealing with in that particular instruction. Python
disassembler has this way of putting together nicely for us humans.