code,

Python Internals (P2) - Example Walk through

Sumeet Sumeet Follow Feb 24, 2020 · 5 mins read
Python Internals (P2) - Example Walk through
Share this

Example walk through

Let us take a look at how a simple program is executed in Python world.

Compilation

Simple Python Program

Save this program in a file named test.py and execute using Python and you should see below output.

python test.py
3

Now, in order to understand what happened in the background (the Purple box) let us take a first step to disassemble this code. Since the source code file you wrote (test.py) has been compiled and created a bytecode - let us take a look at the same. Before we actually take a look at the bytecode, in your console start the python interpreter, execute following commands.

Simple Python Program

What happened here is…

  1. We compiled test.py and stored this object in a variable c.
  2. When we print c, it gives the output which indicates that it is a code object and the first line starts at a particular memory location.
  3. This code object c has many properties. If you want to take a look at all the properties available you can do so by executing dir(c) in python interpreter. One such property is co_code. We printed the same in this step and it gave us a string of Hex-looking characters separated by \. This is technically the real bytecode which is then fed to the actual interpreter.
  4. In the next step we check the length of this bytecode.
  5. Here we check the type of co_code and it clearly said ‘bytes’.
  6. This command is used to get the actual ASCII character codes from bytecodes. As we can see we now have bunch of ASCII codes in the form of an array. I know what you are thinking here, for now just hold this thought here and just assume that this is what is executed by the compiler.

Bytecode

So we have travelled from source file test.py which contained the python code to the bytecode. We then represented it in ASCII format. Let’s disassemble this code now. Python source code provides us with a dissassembler and can be found in its library.

Path: Lib > dis.py - feel free to take some time to go through it and maybe connect it back to the below screenshot.

Simple Python Program

Alright, very first impression - this looks like assembly code. It is kind of an assembly code for Python interpreter but it is more human readable. Our program had 4 lines of code - these four lines are converted by the compiler into these many lines of bytecode. The very first column starting from left represents the corresponding number of line of code in your source code file. As we can see, first line x = 1 is converted into 2 lines of bytecode. And similarly for rest of the 3 lines. First column, helps correlate your source code to bytecode.

The second column represents the byte offset for every instruction. It represents the bytes occupied by that corresponding instruction. It starts with 0 for first instruction which is 2 bytes long, thus the next instruction starts at 2 and so on. If you look at this value for the last instruction (RETURN_VALUE) it starts at 26 which means that the end would be at #28. This corresponds to the step# 4 in previous section where we took a look at co_code after compilation. In this step when we checked the output of the compiled code - to correctly said 28.

The third column represents the actual instruction. The program source we wrote makes use of instructions like - LOAD_CONST, STORE_NAME, LOAD_NAME, BINARY_ADD, CALL_FUNCTION, POP_TOP, RETURN_VALUE. These are the instructions which actually tell the Python interpreter what to do in the next step. The interpreter is really simple. It only executes the current step. It doesn’t know anything about what is coming next. However, we shall get into the details of this a bit later. Remember I asked you to hold your thoughts at ASCII codes topic? - now that you know these instructions, I would suggest this is the right time to take a look at Python source code file “Include > opcode.h”. If you can’t go through it right now, here’s a snippet of how it looks. As you can see, there are some familiar opcodes which we just came across. For example LOAD_CONST is associated with ASCII Character 100. Spend a few minutes and let the circle complete here for you.

Simple Python Program

The fourth column represents the pointer to value stack. Python interpreter maintains something called as value stack and we shall know more about it in upcoming sections where we discuss frames and scopes. Same applies to the values which you can see in the brackets in fifth column. Just to give you an overview, the values in the brackets represent what the Interpreter is dealing with in that particular instruction. Python disassembler has this way of putting together nicely for us humans.

Join Newsletter
Get the latest news right in your inbox!
Sumeet
Written by Sumeet Follow
Hi, I am Sumeet, and I believe the world belongs to the doers. Here, I publish my technical tinkering experiences. I hope you like it!