Friday, July 13

Learning python: Iterator, Generator, Iterable object

An iterator is just like a fancy way to say "a for loop"; well it's not. I will walk through what I learn about those iters today.

There are three "iters" in Python:

  1. an iterator: it is an iteration object for a class, supported by two methods __iter__ and next. Classes support iteration object could create an iterator corresponding to themselves by iter method; then one could use next method to retrieve the next element in this class object. After calling the last element, two things happen: this iterator is exhausted and could not be used anymore; any more call would result in StopIteration exception. 
  2. a generator: it is a function to generate an iterator using yield. 
  3. an iterable object: treat it as a repeatable iterator

After definition, let's see some codes.

1. built-in iterators

L = [1,2,3,4,5]
D = {'a':1,'b':2,'c':3}
i = iter(L)
print i
print D.iterkeys()
print D.itervalues()
print D.iteritems()
for elt in i:
print elt
i.next()
'''
code output:
<listiterator object at 0x101cf6150>
<dictionary-itemiterator object at 0x101cc4e10>
<dictionary-keyiterator object at 0x101cc4e10>
<dictionary-valueiterator object at 0x101cc4e10>
1
2
3
4
5
Traceback (most recent call last):
File "iter_survey.py", line 50, in <module>
i.next()
view raw builtin.py hosted with ❤ by GitHub
See I create four iterators using iter method. For any class supports iterator you could create an iterator using iter. Moreover, when you just use for loop directly:
for elt in L:
print elt
'''
output:
1
2
3
4
5
view raw for.py hosted with ❤ by GitHub

What happens behind the scene is, it will automatically create an iterator based on L, and then yield one element every time. Thus, when you are using for loop, you are basically implicitly calling i=iter(L) and i.next(). But an iterator is more than for loop, because we could explicitly call next if we like.

2. user defined iterators

Now if I have a class, whose data contains a sequence, maybe a list. To enable the iterator of this class I have to implement two functions __iter__ and next:
class IterTest(object):
def __init__(self, data):
self._data = data
self._count = 0
def __iter__(self):
return self
def next(self):
if self._count == len(self._data):
raise StopIteration
else:
output = self._data[self._count]
self._count += 1
return output
d2 = IterTest("abcd")
I = iter(d2)
print I
for char in I:
print char
'''
output:
<__main__.IterTest object at 0x10e813150>
a
b
c
d
view raw myiterator.py hosted with ❤ by GitHub
In __iter__, normally you just need to return the object itself; in next, you have to iterate over the sequence you want, and also indicate when to terminate the iteration and give the exception. Note the count variable is initiated in constructor as a instance variable, which means it could be re-initialized; this is to meet the requirement of every iterator could only iterate once. Also, in order to use your class in a loop, you have to enable its iterator like this.

3. generator

class GenTest(object):
def __init__(self, data):
self._data = data
def gen(self):
count = 0
while count < len(self._data):
count += 1
yield self._data[count - 1]
d = GenTest([4,3,2,1])
g = d.gen()
print g
print g.next()
for dat in g:
print dat
g2 = d.gen()
print g2.next()
'''
output:
<generator object iterall at 0x102814280>
4
3
2
1
4
view raw mygenerator.py hosted with ❤ by GitHub
Being a function to generate an iterator means it works similar with iter; one difference is it looks more concise in your class: you dont need a instance variable to keep track of the sequence because now all stuff about the iterator has been put into the generator function. It use the yield block to implement the iteration; I am gonna dig deeper on yield later, but it basically is able to pause the function every time and produce one element at a time.

4. iterable objects

To be fair, all things I describe above could be treated as "subclass" of iterable objects with special constraints, like an iterator could only be used once. An iterable object could be called next multiple times, and when it runs to the end of the sequence, it restarts again.

class IterableTest(object):
def __init__(self, data):
self._data = data
def __iter__(self):
return self.iterall()
def iterall(self):
count = 0
while count < len(self._data):
count += 1
yield self._data[count - 1]
if __name__ == '__main__':
d3 = IterableTest([4,3,2,1])
print d3
for dat in d3:
print dat
for dat in d3:
print dat
'''
output:
<__main__.IterableTest object at 0x10545f250>
4
3
2
1
4
3
2
1
view raw myiterable.py hosted with ❤ by GitHub
If it is an iterator, looping it twice will only display one iteration. By making your class iterable, now it basically behaves like other built-in iterable objects, like list.

So what is the take-home message? There are several situations that you have to implement iterations, like you have this class with a sequence of data as instance variable and you want to loop through them.

No comments:

Post a Comment