Stream

Learning Objectives

At the end of this sub-unit, students should

understand the peculiarities built-in map and filter.
know how to avoid pitfalls related to map and filter.

A Deeper Look

There are quite a number of things we did not explain properly about the built-in map and filter. This is hidden in the way we were using them, but the more perceptive of you probably realized that we were only using it in a for-loop. If you have tried it our on your own believing that the result of map and filter are sequence, you would have been disappointed because it is not. Consider the simple case of indexing. We get the following error if we try to index the result of map or filter.

Map Indexing

lst1 = [1, 2, 3, 4]
lst2 = map(lambda elem: elem + 1, lst1)
print(lst2[0])

1	`TypeError: 'map' object is not subscriptable`

Filter Indexing

lst1 = [1, 2, 3, 4]
lst2 = filter(lambda elem: elem % 2 == 0, lst1)
print(lst2[0])

1	`TypeError: 'filter' object is not subscriptable`

This clearly shows that the result of map and filter are not sequence. They are definitely not list or tuple. So what are they? Recap that we can check the type using the type(...) function. Let us check.

lst1 = [1, 2, 3, 4]
lst2 = map(lambda elem: elem + 1, lst1)
lst3 = filter(lambda elem: elem % 2 == 0, lst1)
print(type(lst2))
print(type(lst3))

1 2	`<class 'map'> <class 'filter'>`

That does not help much. But it at least gives us a glimpse that it is not a sequence. The question is, what is it exactly? We did not have a name for this previously, so let us give it a name. Recap the operation that we can do with the result: we can iterate with for-loop. So it is reasonable to name this type an iterable.

Iterable

Consider a data type \(T\) with a variable var of type \(T\). We can say that \(T\) is an iterable if the following operations can be performed assuming valid input.

Iteration: It can be used in a new construct called for-loop.
- Without the for keyword, it will be checking if an element is inside the sequence (i.e., elem in seq).

So the result of map and filter is an iterable¹. If we can arrange things in hierarchy, it will look like the following. Note that map and filter are special kinds of iterables, it is a

Hierarchy

Single Use

So we know that map and filter are at least an iterable. It is actually a special kind of iterable. The more appropriate name given in Python convention is a generator. However, we will use the name stream as it is the more commonly accepted name in general.

A stream is an iterable that is single use. It is a like looking at a flowing stream of water. Once the stream flows, if there is nothing that replenish the stream (e.g., a continuous water source), then the stream will dry up. It can no longer be used.

The same thing is true for map and filter. One the data is used up (e.g., in a for-loop), then the next time we try to use it again, it will be empty. We can illustrate this with the example below. Note how the second for-loop on the same result of map and filter does not have any output.

Map

>>> lst = [1, 2, 3, 4]
>>> stream = map(lambda item: item, lst)
>>> for elem in stream:
...   print(elem)
...
1
2
3
4
>>> for elem in stream:  # same stream
...   print(elem)
...
>>> # nothing is printed!

Filter

>>> lst = [1, 2, 3, 4]
>>> stream = filter(lambda item: True, lst)
>>> for elem in stream:
...   print(elem)
...
1
2
3
4
>>> for elem in stream:  # same stream
...   print(elem)
...
>>> # nothing is printed!

Why Single Use?

The reason why both map and filter are single use is that it is computed lazily. In other words, unless the data is actually used, nothing is computed. Due to this, once a value is computed and used, to save space, it is immediately discarded. Try running the following and see extreme difference in the timing.

map(lambda item: item * 2, range(100_000_000))

for elem in map(lambda item: item * 2, range(100_000_000)):
    pass  # do nothing

Reusing Map and Filter

Given this limitation of single use, what can we do to preserve the value? At the beginning, we mentioned that the type (e.g., int, float, etc) can be used to _convert from one data type to another. This is a very useful lesson because it will also work for list and tuple. And that is exactly how the result from map or filter can be made permanent. We can convert them into list (or tuple) and assign it to a variable.

Map

>>> lst = [1, 2, 3, 4]
>>> seq = list(map(lambda item: item, lst))
>>> for elem in stream:
...   print(elem)
...
1
2
3
4
>>> for elem in seq:
...   print(elem)
...
1
2
3
4

Filter

>>> lst = [1, 2, 3, 4]
>>> seq = list(filter(lambda item: True, lst))
>>> for elem in stream:
...   print(elem)
...
1
2
3
4
>>> for elem in stream:  # same stream
...   print(elem)
...
1
2
3
4

With this, we can give a more appropriate solution for map_n and filter_n from previous sub-unit.

Multiply by n

def map_n(lst, n):
  return list(map(lambda item: item * n, lst))

Keep Multiple of n

def filter_(lst, n):
  return list(filter(lambda item: item % n == 0, lst))

For good measure, to avoid potential errors in the future, always convert the result of map and filter into list or tuple. The only reason not to convert them is efficiency reason. But we are more concerned with correctness at this level.

To be even more precise, map and filter are actually another data type. They are constructors rather than functions. We can see this from the output <class 'map'>. ↩