Intro and Installation
This is a fairly large topic. Every body learns python for the sake of being an easy language - it is coming out to be an essential languages to be learnt thanks to AI/ML ecosystem. Of course there are other usages of Python as well. Learning this language is easy, but in order to understand it even better, one step further would be to understand how it’s interpreter is actually implemented. Python is an open source language, and you have all the time in the world to go through it. I am writing this blog post to present a very high level overview of how python works internally - since I have been able to take a look at it’s internals and now I know the language better - even though I haven’t written anything beyond “Hello World”.
This post is inspired by a great video series by Philip Guo accompanied with my desire to explore and know the language in a better way. I am trying to make an attempt to give initial direction in exploring the internal workings of python and it’s magnitude is no way near to Philip’s aura. I have written this by including some of the nuances of learning - which I observed while I was going through the process. If you are inspired to take a deeper look into this topic, I would highly recommend going through this 10 hour walk through.
You must have heard Python is an interpreted language. Well, it’s true but partially. Ideally in almost all the languages, the source code (in our case
.py files) is first converted to something that is closer to machine and then fed to on the go executable which produces the output. Before we go any further, let us try to understand how Python works, what it consists of on a very high level - end to end.
Referring to the diagram here - the large Purple box named “Python” represents, well, the Python executable. When you write a program in Python, it is fed to this executable, and if everything is well, it produces the output. The Python executable itself has some more surprising components. Within this box, your source code is first processed by a compiler to produce something called as bytecode, which is then interpreted by one of the Python interpreters to produce the real world output. The compiler compiles the source code. The implementation of the compiler it self is a generic one and honestly it is a different topic all together. Bytecode is something that is closer to the processor. In the upcoming sections we would see it almost looks like assembly language code, but it’s not. In fact, it is somewhere between assembly language but still very much human readable. Interpreter is the main topic here. This is where all the magic happens. As you can see in the image, interpreter has various implementations. The official implementation is CPython which is written in C. Below are some more with links to respective sources.
- CPython - official implementation, can be found on python.org as well.
- Jython - Python interpreter written in Java.
- PyPy - Python interpreter written in Python itself.
- IronPython - Implementation of Python closely integrated with .NET framework.
- Skulpt - this is a cool one. This is browser implementation for Python!
As mentioned earlier, Python is an open source language created 30 years ago by a programmer named Guido van Rossum. You can find the source code of the same (CPython) on their website or you can clone the github repository as mentioned in above list. The steps to compile this source code into Python executable can be found here. The beauty of open source is, you can actually alter this source code as per your needs and compile your own version of interpreter. Or, you may choose to write the python interpreter completely from scratch? may be in one of your esoteric programming languages?
Python Source Code Tree
Looking at Python source code can be overwhelming at times. There are a lot of
.h files and normally you would be lost. It is difficult to know where to start. For starters, there are few important directories as below where you may want to take a glance initially.
- Include - this directory contains all the header files related to the interfaces for object. This is what the outside world sees.
- Objects - contains all
.cfiles. Each file represents specific type of object being used in python programs.
- Python - this is the main runtime of Python.
- Lib - standard library module which is pretty big. The great part about this is that there are so many library functions provided out of the box. As a dev you just have to import them into your program - instead of reinventing the wheel.
- /Python/ceval.c - this file contains the main interpreter loop. Image below.
I would suggest to spend some time taking a look around the source code files.