Wednesday, August 22

iterate, iterate, iterate!

Gather around and learn some iterate stuff again. Great great tutorials here, here, and especially here.

1. iter()
this is a function, takes in an object, return its iterator. It corresponds to the __iter__() method in the class definition. We use it as:

An iterator supports next() methods, which returns the next element in the object. Such object, that supports __iter__() and next(), is called iterable. There will be two different scenario when we call __iter__() on an object:

  1. the class itself has implemented next() and __iter__(). In this case the __iter__() would most likely just return the object itself - the object is the iterator;
  2. __iter__() returns an object of another class, which should be an iterator class. In such case, the object that calls __iter__() does not have next() implemented, only the iterator class has next().

2. iterator
As described above, an iterator is an object that supports next(). Note objects support __iter__() are called iterable, for being able to return an iterator.

Before getting to the iterator, let's meet container. Container mostly time refers to an abstract data type, meaning a collection of arbitrary objects. Lists, dictionaries, arrays could all be called containers. So what does iterator do? As far as I know, iterator is an efficient way to walk through all objects in one container.

Instead of having all objects in the memory, an iterator only call and pick up one object a time by calling next(), until it hits the end of the container, where it will throw a StopIteration error. The for statement in python also call create an iterator automatically.

The life of an iterator is limited. It could only be iterated once, after which it will throw StopIteration if next() is called.

3. generator
The common case using an iterator would be: the iterator returns an object a time, then the program does some stuff on that object until objects are exhausted. In fact, most built-in data type in Python supports returning iterator as describe here.

A generator, on the other hand, could be considered a iterator with the ability to do stuff to objects: it picks up an object a time from a container, then instead of returning it directly, it does some stuff, and return the result of this process. It compresses code when you want to do complicate stuff on objects.

What is yield?

4. yield
Most generators created so far are created using yield. Yield often appears at places where return should be sitting at. Instead of simply returning whatever it follows, yield does an interesting process: the running code will return the statement following yield when it first meet it; at this time, the yield code will be suspended, i.e. paused but all its environment and variables get to survive; when the next time the code calls the generator again, it starts where it left previous time and continue until it hits yield again; then just repeat until the container is exhausted.

The code above, every time the for statement runs, it goes into the square generator, running until yield, return the square result and get back to the for statement.

What the advantage the generator brings is values it generates are creating on the air. It is different from the case that one first create all objects and then return them all or return one each time (iterator!). Generator is dynamic. Sometimes you dont know what to create until you actually run into the situation (gimme an example!).

One last note, both iterator and generator could only be walked once. Here is a subtle bug misusing generator.

5. why bother?
Speaking of why using iterator and generator. One major concern is the case that when you have a container that has so many objects if you put them all in memory it will just overflow, or super slow. If you will only use each object once or even you only want one particular object, it is more efficient to use them. Also, iterator and generator produce more clear and concise code.

By the way, I just know this is actually the predecessor of python. Looking good.