Understanding Python Comprehension and Generator Expressions

Let’s take a look at the following Python expressions:

( i for i in iterable )
[ i for i in iterable ]
{ i for i in iterable }
{ i : j for i, j in iterable }

Each of these four expressions are alike, but all they serve different purposes. The first one is called a “generator expression”, the others are “comprehension expressions” for lists, sets, and dictionaries, respectively. While comprehensions are evaluated to in-memory objects, generators just iterate over the items of the iterable one at a time, so that only one item needs to reside in memory. Python 3.6 introduces the asynchronous version of both comprehension and generator expression, but we’re not going to address those here.


A closer look at Python Comprehensions

Comprehensions are an extension to the Python syntax for list, set and dict literals. The Python manual calls “literals” the expressions yielding immutable objects (string, bytes, integer, float, complex), while it calls “displays” the expressions yielding mutable objects (list, set, dict). Morever it says about displays: “either the container contents are listed explicitly, or they are computed via a set of looping and filtering instructions, called a comprehension.” It’s these comprehensions that we want to dive into.

A list comprehension simply gives you a way of converting one list into another list that you can iterate through. Comprehensions for dicts and sets do the same thing. Only built-in mutable container types have comprehension expressions. The common syntax for a list literal is:

[ expr1, expr2, ..., *iterable, ... ]

where expr1, expr2 are expressions representing a single item in the list, while iterable is an expression for an iterable object whose items will be added one by one to the list. I know this is still confusing. Bear with us! We’re going to go over several examples to make it clearer.

Let’s go back to your Python fundamentals for a bit. Recall an iterable is an object which provides the iter() or getitem() method. The getitem() method provides random access to the object’s items through indexing, while the item() method only grants sequential access through an iterator.

Let’s look at a couple examples:

[ *[1, 2, 3], 4, 5, min(6, 7) ]

[1, 2, 3] is an iterable (a list), min(6, 7) is an expression, evaluated to 6, and the other items are integers. This expression equals to [ 1, 2, 3, 4, 5, 6 ].

It’s worth noting the the asterisk * notation before the iterator might not work in older versions of Python (2.7.x). The asterisk just tells Python to return the individual elements of the iterable you specify. If it doesn’t work for you, just remove the asterisk in the following examples. It won’t give you exactly the same results I present in this tutorial, but it’ll at least eliminate the “invalid syntax” error.

Here’s another example.

[ 1, 2, *"home", *(lambda x, y: range(x, y))(4, 7) ]

1, 2 are integers; "home" is an iterable (a string); lambda x, y: range(x, y) is an anonymous function returning a range object (which is also an iterable); (lambda x, y: range(x, y))(4, 7), calls the function and returns range(4, 7).

This expression produces:

[ 1, 2, 3, 'h', 'o', 'm', 'e', 4, 5, 6 ]

It’s worth noting that a lambda expression allows you to define a function object without naming it. At the left side of : there is an optional list of function parameters, while at the right side there is a single expression, which is evaluated and returned as the result of the lambda expression. We’ll save our discussion on lambda expressions for another tutorial

Anyway, as we can see from the examples above, the literals required each member of the new object to be enumerated one after the other, delimited by a ,. Comprehension expressions allow us to iterate over the items of an iterable without listing all the items of the source object.

For example, the expression:

l = [ i for i in range(1, 8, 2) ]

is equivalent to

l = []
for i in range(1, 8, 2):
	  l.append(i)

Both codes will yield [1, 3, 5, 7].

A comprehension and its corresponding loop expansions are not perfectly equivalent due to loop rules on scoping. In fact, in Python 3 a comprehension can’t rebind any name outside its scope. We’ll talk more about this in our Rules section.

l = [ (i, j) for i, j in zip(range(0, 7, 2), range(1, 8, 2)) ]

equals

l = []
for i, j in zip(range(0, 7, 2), range(1, 8, 2)):
	  l.append((i, j))

and evaluates to [ (0, 1), (2, 3), (4, 5), (6, 7) ].

l = [ (i, j) for i in range(0, 7, 2) for j in range(1, 8, 2) ]

is the same is

l = []
for i in range(0, 7, 2):
	  for j in range(1, 8, 2):
		  l.append((i, j))

The second expression is longer but both expressions set the variable l equal to

[(0, 1), (0, 3), (0, 5), (0, 7), (2, 1), (2, 3), (2, 5), (2, 7), (4, 1), (4, 3), (4, 5), (4, 7), (6, 1), (6, 3), (6, 5), (6, 7)]

That’s the advantage of the concise Comprehension Expressions


Get Our Python Developer Kit for Free

I put together a Python Developer Kit with over 100 pre-built Python scripts covering data structures, Pandas, NumPy, Seaborn, machine learning, file processing, web scraping and a whole lot more - and I want you to have it for free. Enter your email address below and I'll send a copy your way.

Yes, I'll take a free Python Developer Kit

Comprehensions allow other clauses, called “filters”, so you can select a subset of the items from the source objects. The filter feature is triggered by an ‘if’ statement in your comprehension expression.

If the expr expression evaluates to an object obj such that bool(obj)==True, then the current element is inserted in the new container object. Otherwise it is silently discarded. If more than one filter is provided, the resulting filter equals the logical conjunction (and) of all filters in order from left to right. I know this is confusing, but take a look at what we mean with another example:

[ i for i in iterable if cond1 if cond2 ]

is equivalent to

[ i for i in iterable if cond1 and cond2 ]

Notice the and in the second expression instead of the second if filter.

Let’s see a few more examples to help these complex comprehension expressions sink in. This short line of code:

l = [ i for i in range(10) if i % 2 == 0 ]

produces the same result as this lengthy line of code:

l = []
for i in range(10):
	  if i % 2 == 0:
		  l.append(i)

They both evaulate to [ 0, 2, 4, 6, 8 ].

This next example is kind of cool. See if you can follow along!

l = [ (i, j) for i in range(0, 3) for j in "PyThoN" if j.isupper() if i % 2 == 0 ]

evaluates to [(0, 'P'), (0, 'T'), (0, 'N'), (2, 'P'), (2, 'T'), (2, 'N')].


A closer look at Python Generator Expressions

Generator expressions are shorthand for anonymous coroutine definitions and calls. How was that for a mouthful? A generator is basically just a special defined function that ends in a yield statement instead of a return statement, but a generator expression lets you consolidate your generator into a single line of code without defining a full generator with a yield statement.

Calling a coroutine returns a generator, which can be called repeatedly. Each time it’s called, it’ll return a new value until it raises a StopIteration exception. Generators are usually called inside a for-loop, but they can also be called using next(), send(), and disposed of using close().

Here’s a simple Generator Expression.

even = ( i**2 for i in range(10) if i % 2 == 0 )

The above generator expression is equivalent to the following coroutine coroutine definition, which we’ll call g():

def g(n):
	  for x in range(n):
		  if i % 2 == 0:
			  yield i**2

Our generator expression, which we attached to even, is equal to the function g evaluated at 10. In other words, even = g(10).

By shortening the generator to a generator expression, we can use even inside a for-loop, like this:

even = ( i**2 for i in range(10) if i % 2 == 0 )
for item in even: 
	  print(item)

We can also get just one value from the generator, with commands like next(even) or even.send(), and finally dispose of the generator before it raises an exception with even.close().

Taking it one step further, we can build our own class simulating an iterable object. The follwing class, EvenSequence, simulates a container of integer numbers from start to stop. It is an iterable, because it provides the __iter__() method at a minimum. This iterable also happens to provide a __getitem__() method for random access to its items through indexing, and a __contains__() method for membership testing.

class EvenSequence:

    def __init__(self, start, stop):
        if (start % 2) != 0:
            start += 1
        if start >= stop:
            raise ValueError("Upper bound <= lower bound.")
        self.start, self.stop = start, stop

    def __getitem__(self, i):
        if i < 0 or i > (self.stop-self.start)// 2:
            raise IndexError("Index out of range.")
        else:
            return self.start + (i * 2)

    def __contains__(self, i):
        return (i % 2) == 0 and self.start <= i <= self.stop

    def __iter__(self):
        for i in range(self.start, self.stop+1, 2):
            yield i

We can now use an object from this custom class in a generator expression, like this:

( i for i in EvenSequence(2, 40) if i % 3 == 0 )

Get Our Python Developer Kit for Free

I put together a Python Developer Kit with over 100 pre-built Python scripts covering data structures, Pandas, NumPy, Seaborn, machine learning, file processing, web scraping and a whole lot more - and I want you to have it for free. Enter your email address below and I'll send a copy your way.

Yes, I'll take a free Python Developer Kit

This iterates over the even numbers from 2 to 40 that are also multiples of 3. The results? [ 6, 12, 18, 24, 30, 36 ]

After defining the class, you can iterate through and print each item in the list with a Python command like this:

l=( i for i in EvenSequence(2, 40) if i % 3 == 0 )
for item in l:
    print(item)

Note that comprehensions and generator expressions are themselves expressions, so we can put them in comprehension and generator expressions everywhere an expression is required by the syntax.

Here’s an example of a generator within another generator:

for g in ( ( i for i in range(j) if i % 3 == 0) for j in (12, 30) ):
	  for k in g:
		  print(k)

which prints the multiples of 3 in range(0, 12) and the multiples of 3 in range(0, 30).


Scoping and evaluation rules for generator and comprehension expressions

There are several rules we need to keep in mind when working with generator and comprehension expressions. We’ll go over them now.

Rule 1: Name Binding

A generator or comprehension can’t change the binding of a name outside its scope. For example:

i = 0
l = []
for i in range(1, 10, 2):
	  l.append(i)

rebinds the name i to the object 9. However, while using a comprehension expression, the binding of i outside the comprehension doesn’t change:

i = 0
l = [ i for i in range(1, 10, 2) ]

The i outside the comprehension scope is still bound to 0 after evaluating the comprehension. In both cases l==[1, 3, 5, 7, 9].

NOTE: This is not true for Python 2! Comprehensions in Python 2 will rebind the name to 9 in the example above, but generator expressions won’t. This was resolved in Python 3.

Rule 2: Name Scoping

All scoping rules for Python names still apply to names referenced within the leftmost iterable of the comprehension or generator, but all other for and if clauses are evaluated in the nested scope provided by their leftmost clause.

For example:

[ (x, y) for x in range(3) for y in range(x, x+3) ]

will yield:

[(0, 0), (0, 1), (0, 2), (1, 1), (1, 2), (1, 3), (2, 2), (2, 3), (2, 4)]

Here we can see that the x name of the leftmost for clause is referenced by the rightmost for clause. Take the following command

[ x for x in range(3) for x in range(x, x+2) for x in range(x) ]

and let’s consider the corresponding for-loop:

l = []
# outer loop
for x in range(3):
	  # middle loop
	  for x in range(x, x+2):
		  # inner loop
		  for x in range(x):
			  l.append(x)

In the first cycle, the outer loop binds x=0. The middle loop is in the body of the outer loop, so the range object is created as range(0, 2), then it rebinds x=0. The inner loop is in the body of the middle loop, so its range object is created as range(0), which immediately exits the loop without inserting items into l, and so on. So the comprehension evaluates to [ 0, 0, 0, 1, 0, 1, 0, 1, 2 ].

Rule 3: Expression Syntax

The comprehension and generator expression syntax mandates that the expression should have at least one for clause, then one or more if or for clauses that can be freely intermingled.

The elements of the new container are those that would be produced by considering each of the for and if clauses as a block, from left to right, and a new item will be inserted in the new container if the evaluation reaches the innermost block.

l = [ (x, y, z) for x in range(1, 10) if x % 2 == 0
			for y in range(1, 15) if (x+y) % 3 == 0
			for z in range(3, 15) if x+y+z == 10 ]

This is the same as saying:

l = []
for x in range(1, 10):
	  if x % 2 == 0:
		  for y in range(1, 15):
			  if (x+y) % 3 == 0:
				  for z in range(3, 15):
					  if x+y+z == 10:
						  l.append((x, y, z))

Both expressions evaluate to [ (2, 1, 7), (2, 4, 4), (4, 2, 4) ]. That’s pretty amazing, isn’t it?

Rule 4: Order of Operations

In generator expressions, the outermost for clause is evaluated immediately, while the other clauses’s evaluation is deferred until the generator is executed. The outermost for clause may cause an exception even before the first call to the next() method.

The expression:

( i for i in (lambda x, y : range(x, y, 2))(3, 15) )

will be evaluated to ( i for i in range(3, 15, 2) ) even before the first call to the generator. The rest of the expression will be evaluated to ( 3, 5, 7, 9, 11, 13 ).

Closing Thoughts

Throughout this tutorial we presented several Python comprehensions and generator expression examples. These examples show that comprehensions are nothing more than an extended notation for their resepective objects, allowing us to specify their content without listing their items one by one. Generators are a memory-efficient way of iterating over an iterable by keeping in memory just one item at a time.

Did you find this free tutorial helpful? Share this article with your friends, classmates, and coworkers on Facebook and Twitter! When you spread the word on social media, you’re helping us grow so we can continue to provide free tutorials like this one for years to come.


Get Our Python Developer Kit for Free

I put together a Python Developer Kit with over 100 pre-built Python scripts covering data structures, Pandas, NumPy, Seaborn, machine learning, file processing, web scraping and a whole lot more - and I want you to have it for free. Enter your email address below and I'll send a copy your way.

Yes, I'll take a free Python Developer Kit