上QQ阅读APP看书，第一时间看更新

Iterators

You previously learned about several iterable objects, such as lists, sets, and tuples. In Python, a data type is considered an iterator if an __iter__ method is defined or if elements can be accessed in a sequenced manner. These three data types (that is, lists, sets, and tuples) allow us to iterate through their contents in a simple and efficient manner. For this reason, we often use these data types when iterating through the lines in a file or through file entries within a directory listing, or when trying to identify a file based on a series of file signatures.

The iter data type allows us to step through data in a manner that doesn't preserve the initial object. This seems undesirable; however, when working with large sets or on machines with limited resources, it is very useful. This is due to the resource allocation associated with the iter data type, where only active data is stored in memory. This preserves memory allocation when stepping through every line of a 3 GB file by feeding one line at a time and preventing massive memory consumption while still handling each line in order.

The code block mentioned here steps through the basic usage of iterables. We use the next() function on an iterable to retrieve the next element. Once an object is accessed using next(), it is no longer available in iter(), as the cursor has moved past the element. If we have reached the end of the iterable object, we will receive StopIteration for any additional next() method calls. This exception allows us to gracefully exit loops with an iterator and alerts us to when we are out of content to read from the iterator:

>>> y = iter([1, 2, 3])
>>> next(y)
1
>>> next(y)
2
>>> next(y)
3
>>> next(y)
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
StopIteration

In Python 2.7, you can use the obj.next() method call to get the same output as the preceding example via use of the next() function. For simplicity and uniformity, Python 3 renamed obj.next() to obj.__next__() and encourages the use of the next() function. With this, it is recommended to use next(y), as shown previously, in place of y.next() or y.__next__().

The reversed() built-in function can be used to create a reversed iterator. In the following example, we reverse a list and retrieve the following object from the iterator using the next() function:

>>> j = reversed([7, 8, 9])
>>> next(j)
9
>>> next(j)
8
>>> next(j)
7
>>> next(j)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

By implementing generators, we can further take advantage of the iter data type. Generators are a special type of function that produces iterator objects. Generators are similar to functions, as those discussed in Chapter 1, Now for Something Completely Different—though, instead of returning objects, they yield iterators. Generators are best used with large datasets that would consume vast quantities of memory, similar to the use case of the iter data type.

The code block mentioned here shows the implementation of a generator. In the file_sigs() function, we create a list of tuples stored in the sigs variable. We then loop through each element in sigs and yield a tuple data type. This creates a generator, allowing us to use the next() function to retrieve each tuple individually and limit the generators' memory impact. See the following code:

>>> def file_sigs():
...     sigs = [('jpeg', 'FF D8 FF E0'),
...             ('png', '89 50 4E 47 0D 0A 1A 0A'),
...             ('gif', '47 49 46 38 37 61')]
...     for s in sigs:
...         yield s

>>> fs = file_sigs()
>>> next(fs)
('jpeg', 'FF D8 FF E0')
>>> next(fs)
('png', '89 50 4E 47 0D 0A 1A 0A')
>>> next(fs)
('gif', '47 49 46 38 37 61')

You can find additional file signatures at http://www.garykessler.net/library/file_sigs.html .