Wednesday, June 13

Learning Python 1: Internal, Object, Types

Having used and loved python for 2+ years, my python knowledge is still missing a lot of basic points. Upon this uncertain intersection of job hunting and continuing PhD pursuit, I decide to learn it again, trying to build a solid foundation. btw I move my blog from tumblr to blogger, simply because I realize tumblr is really a place to share pictures, vids, not long, dull self-rant. Blogger is just simple more like a blog.

1. Internal

The way python works is different from compile/static language like C. This is a great illustration:
There is a Python interpreter to do this whole job; this interpreter, in the major Python version we use, which called CPython actually, is written in C. It will first compile .py file into .pyc file, called compiled python source. Then it reads this file in the Python Virtual Machine (PVM). The recompilation is to speed up the execution, like cache. The thing here is, although the interpreter also compiles the file, it only compile it to a byte code level, which then will be read line by line in using PVM. Compared with C, which compiles C code into machine code, directly read by chips, python loses its speed race mainly because of this.

2. Everything is object

Python has this similar idea with Ruby (looks like I am an ace of ruby trying to learn python, sadly it is not the case). In Ruby, if I remember it correctly, everything you use is an object. a.b is just calling method b on object a, even a = 3 is calling method = on object a passing argument 3 (I thought this was pretty cool when I saw it first time). Ruby is like an extreme case: in my world, only objects and their methods exist.

However python is little bit different. For a = 3, what python does is actually: (1). create an object (also allocate some memory) to represent value 3; (2). create the variable a if it does not already exist; (3). point a to the object 3. Now the difference is a is not an object, but a pointer that pointing to an object. Similarly, b = [1,2,3,4], [1,2,3,4] is a list type object, b is just a variable pointing to this object. So in python when you type a = 3, you are not calling a method, but linking them. But still this gives Python a lost of dynamic, such as you don't declare the type of a variable; it will only be decided by the object that the variable is pointing to.

The thing is, when you do something like this:
a = 1233445678
a = 'spam' 
Now the variable is pointing to the new object, where did the first object go? The answer: reclaimed/cleaned by the auto garbage collection mechanism in Python. What this mechanism does is, for every object created, it sets a counter to count the number of pointers (variables) the object has. When the counter becomes zero, it immediately got erased and the memory space it takes up is reclaimed. But here is an exception, Python allows objects whose value is small integers and strings cached in memory instead of being erased, so that they could be reused. So even their counters drop to zero they will still be there. Considering those objects will be used more often, it is reasonable to do so.

There is a thing called share reference that worth mentioning. It is also a result of pointers.
From above you can see, until the second line of code, a and b are still referring to the same object 3, but when we assign a new object to a, b is still pointing to 3.  Also, if you do something like this:
a = 3
b = a
b += 2
b is turing into 5, but a is still 3. This is just like you have one object 3, after 3 lines of code, you have 2 objects, 3 and 5, pointed by a and b respectively.

Lastly, one more thing about object. There are two ways to compare two variables in Python: == and is. First one compares if their value is the same; latter one compares if they are pointing to the same object. Thus, you would expect following situations have two different result:
L = [1, 2, 3]    
R = [1, 2, 3] or R = L
L == R
L is R
However, if L and R refers to small integer or string, then in both cases == and is would return True. It is because small integers and strings are saved so that for one value there will be only one object.

3. Built-in types

Numbers, Strings, Lists, Dictionaries, Tuples, Files and others. Here only lists and dictionaries are mutable, i.e. when you manipulate them, you are actually changing the value that variable is referring to; otherwise python would just create a new object to hold the new value obtained from your manipulation. Even the integer, e.g. 3, is immutable. Say you have a = 3, if you type in a = 5, object 3 is still there, you just create a new object 5.

Python has a mechanism called dynamic typing. It is because variables themselves do not have types, only objects do. So you don't declare what type of that variable is; the interpreter will only check the type of variable when it executes assignments, like a = 3, it treats a as integer.

But Python has a strict rule on what methods could be used by which type of objects. You cannot call methods that does not belong to this type. Some methods span across several types, like [], +, etc. But types have their type-specific methods and you cannot use them on other types.

No comments:

Post a Comment