Python Data Structures Motivating Example

Suppose we place ourselves again in the role of a teacher in a class on geography. Each of your five students picked a country, and the following table was prepared based on their reports:

Student Country GDP (billion USD) Population In Africa?
Mary Luxembourg 64.2 602005 False
Matthew Eritrea 6.856 4954645 True
Marie None None None None
Manuel Lesotho 2.721 2203821 True
Malala Poland 614.190 38433600 False

In our previous tutorial on Python Data Types, we had to manually type out this information for each calculation. For a small table of only 5 rows this wasn’t a major imposition, but what if we had 300 students? What about 50,000? We can use data structures to capture this information and use it efficiently for our calculations.


Introduction to Python Data Structures

This tutorial will cover common data structures included with Python. Each of these structures include a wide range of applications and options, therefore only the basic features of each structure will be discussed. Advanced features will be explored in future courses. Python has the following built-in data structures:

  • Python Lists
  • Python Dictionaries
  • Python Sets
  • Python Tuples

We’ll cover each of these Python data structure topics during this tutorial. Since this comprehensive tutorial covers a lot of topics, this table of contents will help you navigate:

Want us to make more free Python tutorials? Share this article on Facebook and Twitter! When you spread the word on social media, you’re helping us grow so we can continue to provide free tutorials like this one for years to come.


Python Lists

Lists are the workhorse data structure for Python. Python does not have a built-in array feature, therefore array operations are typically performed using lists1.

Storing Objects in Lists

Lists are defined as a sequence of data types between two square brackets [] and separated by commas. For example, all of the following are valid lists:

[1, 2, 3, 4, 5]
[1.0, 1.1, 1.2, 1.3,]
["a", "b", "c"]
[True, False]
[None, None]
[1, 1.0, "a", True, None]

Note that in the last line of the above example, a mixture of data types can be contained within the same list. A trailing comma may also be used after the last entry of a list.

Any Python object can be stored in a list. For example, the range() class defines an iterator object which can be held in a list. Iterators will be discussed in a later tutorial.

[range(1,5), range(5,10), range(10,15)]
> [range(1, 2), range(2, 3), range(3, 4)]

Since lists are a Python object, they can also be stacked within lists:

[[1, 2, 3], 
 [4, 5, 6], 
 [7, 8, 9]]
> [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

Note in the above example, the square brackets can be used similar to parentheses to extend input over multiple lines. Extending two dimensional lists like the one above over multiple lines makes the code easier to read.

The type() function can be used to identify an object’s data structure:

l = [1, 2, 3, 4, 5]
type(l)
> list

Addressing List Elements

Elements within a list can be addressed similar to character slicing:

x = ["a", "b", "c", "d", "e", "f", "g", "h", "i"]
print(x[0]) #Note Python begins indexing at 0
> a
print(x[8]) #Last element will be length - 1
> i
print(x[3:5])  #Selects a range of elements, returned as a list
> ['d', 'e']
print(x[5:])  #Selects elements starting at index 5 to the end of list
> ['f', 'g', 'h', 'i']
print(x[:3])  #Selects elements from beginning of list to index 3, exclusive.
> ['a', 'b', 'c']
print(x[-1])  #Selects the last element
> i
print(x[-5:-3])  #Negative integers select elements relative to the last index
> ['e', 'f']

If lists are stacked, then multiple indices can be used to address elements within sub-lists

x = [[1, 2, 3], 
     [4, 5, 6], 
     [7, 8, 9]]
print(x[1]) #Addresses sublist 1
> [4, 5, 6]
print(x[1][0]) #Addresses sublist 1, element 0
> 4
print(x[1][1:])
> [5, 6]

Unlike strings, list elements are mutable; elements within lists can be changed.

x = ["a", "b", "c", "d",]
x[0] = 1.0
print(x)
> [1.0, 'b', 'c', 'd']

Addressing indices not present in a list will raise an IndexError.

x[42]
> IndexError: list index out of range

List Operations

This introduction to lists will cover basic list operations. There are numerous operations that can be performed on lists, and those will be discussed in detail in later tutorials . The sum(list) function will output the sum of a list of numeric values:

x = [1.0, 1.1, 1.2, 1.3,]
sum(x)
> 4.6

The len(list) function will give the length of a list:

x = ["apple", "blueberry", "cherry",]
len(x)
> 3

The list.append(object) method will append a given object to the end of a list:

x = ["apple", "blueberry", "cherry",]
x.append(3.14)
print(x)
> ['apple', 'blueberry', 'cherry', 3.14]

The list.insert(i, object) method will insert an object at index i of the given list:

x = ["apple", "blueberry", "cherry",]
x.insert(0, "blackberry") # Inserts element at front of list
print(x)
> ['blackberry', 'apple', 'blueberry', 'cherry']
x.insert(len(x), 3.14) # Insert element at end of list, like list.append()
print(x)
> ['blackberry', 'apple', 'blueberry', 'cherry', 3.14]
x.insert(500, "mincemeat") # Inserting an element beyond the length of the list will append 
> ['blackberry', 'apple', 'blueberry', 'cherry', 3.14, 'mincemeat']

The del(object) function can delete given elements of a list2.

x = ["apple", "blueberry", "cherry",]
del(x[1:])
print(x)
> ['apple']

PythonList Example

Suppose we wish to input the table from our motivating example into a list. We can manually create the list with the following:

table = [["Mary", "Luxembourg", 64.2, 602005, False],
		 ["Matthew", "Eritrea", 6.856, 4954645, True],
		 ["Marie", None, None, None, None],
		 ["Manuel", "Lesotho", 2.721, 2203821, True],
		 ["Malala", "Poland", 614.190, 38433600, False],
		]

Now, we can return values within the table without manually typing them:

table[0][:] # Return the first row
> ['Mary', 'Luxembourg', 64.2, 602005, False]
table[-1][0:2] # Last row, first and second elements.
> ['Malala', 'Poland']

Suppose Marie has finally turned in her project, and we want to add it to the table:

table[2] = ['Marie', 'Swaziland', 3.938, 1343098, True]
print(table)
> [['Mary', 'Luxembourg', 64.2, 602005, False],
>  ['Matthew', 'Eritrea', 6.856, 4954645, True],
>  ['Marie', 'Swaziland', 3.938, 1343098, True],
>  ['Manuel', 'Lesotho', 2.721, 2203821, True],
>  ['Malala', 'Poland', 614.19, 38433600, False]]

Suppose we wanted to extract just the column of GDPs from the table. We may think that we can accomplish this by using the second index as a column, however this will not work:

table[:][2]
> ['Marie', 'Swaziland', 3.938, 1343098, True]

The result instead produced the line at index 2 of the table. This is because indexing addresses the list or sublist to the immediate left of the index. In other words, sublists are extracted in order from left to right. In the above example, table[:] returned a list consisting of all rows from the table (what we started with) and table[:][2] selected the 2nd index from this list, which was just the third row of the original list. We can instead select the GDP column by using list comprehensions, which are a kind of loop that will be discussed in detail later.

GDP = [x[2] for x in table]
print(GDP)
> [64.2, 6.856, 3.938, 2.721, 614.19]

We can then use the sum function to find the average GDP:

GDP_mean = sum(GDP)/len(GDP)
print(round(GDP_mean, 3))
> 138.381

Get Our Python Developer Kit for Free

I put together a Python Developer Kit with over 100 pre-built Python scripts covering data structures, Pandas, NumPy, Seaborn, machine learning, file processing, web scraping and a whole lot more - and I want you to have it for free. Enter your email address below and I'll send a copy your way.

Yes, I'll take a free Python Developer Kit

Python Dictionaries

Dictionaries in Python are used to store key:value pairs. Each value can be indexed using its associated key, while the dictionary itself is unordered. Dictionaries are defined using curly braces {}.

d = {"a":1, "b":2, "c":3, "d":4}
d
> {'a': 1, 'b': 2, 'c': 3, 'd': 4}
type(d)
> dict

All keys must be unique in a dictionary. Defining a key twice will overwrite the initial value

d = {"a":1, "b":2, "c":3, "d":4, "a":500}
d
> {'a': 500, 'b': 2, 'c': 3, 'd': 4}

Addressing Dictionaries Elements

All values within the dictionary are addressed by their keys, rather than index:

d["c"]
> 3

Attempting to address a dictionary value with a key that does not exist will raise a KeyError:

d[1]
> KeyError: 1

Note: Integers should be avoided as keys to prevent confusion with list indices. For example,

d = {2:"apple", 4:"blueberry", 3:"cherry", 1:3.14}
d[1]
> 3.14

Here, the value corresponding to the key 1 was returned, rather than the first value in the dictionary.

Dictionary Operations

Unlike lists, dictionary elements can be added directly through assignment:

d = {"a":1, "b":2, "c":3, "d":4}
d["e"] = 5
d
> {'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5}

The keys of a dictionary can be returned as a list using the list(object) function.

d = {"c":1, "a":2, "b":3, "d":4}
list(d)
> ['c', 'a', 'b', 'd']

The keys can also be returned in sorted order using the sorted(object) function.

d = {"c":1, "a":2, "b":3, "d":4}
sorted(d)
> ['a', 'b', 'c', 'd']

Python Dictionary Example

Suppose we want to store the table from our classroom example, but with additional information to make accessing information easier. We can create the following list of dictionaries:

d = [ {"Name":"Mary", "Country":"Luxembourg", "GDP":64.2, "Pop":602005, "A?":False},
	  {"Name":"Matthew", "Country":"Eritrea", "GDP":6.856, "Pop":4954645, "A?":True},
	  {"Name":"Marie", "Country":None, "GDP":None, "Pop":None, "A?":None},
	  {"Name":"Manuel", "Country":"Lesotho", "GDP":2.721, "Pop":2203821, "A?":True},
	  {"Name":"Malala", "Country":"Poland", "GDP":614.190, "Pop":38433600, "A?":False},
		]

Now we can make direct queries to each row’s data:

d[1]["GDP"]
> 6.856
d[-1]
> {'A?': False,
>  'Country': 'Poland',
>  'GDP': 614.19,
>  'Name': 'Malala',
>  'Pop': 38433600}

Get Our Python Developer Kit for Free

I put together a Python Developer Kit with over 100 pre-built Python scripts covering data structures, Pandas, NumPy, Seaborn, machine learning, file processing, web scraping and a whole lot more - and I want you to have it for free. Enter your email address below and I'll send a copy your way.

Yes, I'll take a free Python Developer Kit

Python Sets

Sets are used to implement the properties associated with mathematical sets. Sets are unordered and unique Python objects. Like dictionaries, sets are implemented with the curly braces {}, but contain only elements rather than key:value pairs.

s = {1, 2.3, "apple", False, None}
s
> {1, 2.3, False, None, 'apple'}
type(s)
> set

Sets contain only unique elements, therefore the following to sets are equivalent:

s1 = {1, 2.3, "apple", False, None}
s2 = {1, 2.3, "apple", 1, False, None, "apple"}
s1 == s2
> True

Strings can also be converted into a set of characters using the set(object) function.

s = set("The best of times.")
s
> {' ', '.', 'T', 'b', 'e', 'f', 'h', 'i', 'm', 'o', 's', 't'}

Set Operations

An element’s existence within a set can be determined with the in operator

s = {1, 2.3, "apple", False, None}
"apple" in s
> True
"cherry" in s
> False

Standard set operations can be used

a = {1, 2, 3, 4, 5, 6}
b = {4, 5, 6, 7, 8, 9}
a & b #Intersection: Elements in a and b
> {4, 5, 6}
a | b #Union: Elements in a or b
> {1, 2, 3, 4, 5, 6, 7, 8, 9}
a - b #Difference: Elements in a that are not in b
{1, 2, 3}
a ^ b #Symmetric Difference: Element in a and b that are not in both
> {1, 2, 3, 7, 8, 9}

Python Set Example

Suppose in addition to the geography class in the example above, you also teach a class in social studies. The following two sets contain the names of students enrolled in each class:

geography = {"Mary", "Matthew", "Marie", "Manuel", "Malala"}
social_studies = {"Novak", "Nathan", "Marie", "Nora", "Malala"}

Having these sets, we can perform an analysis on the two classes:

geography & social_studies #Students in both classes
> {'Malala', 'Marie'}
social_studies - geography #Students only in social studies
> {'Nathan', 'Nora', 'Novak'}
len(geography | social_studies) #Number of unique student in both classes
> 8

Python Tuples

Tuples are ordered sequences of objects. They are most frequently used to pass objects to and from functions. Tuples are simply a sequence of objects separated by commas without any container, or with round parentheses ().

t = 1, 2, 3, 4, 5, 6
t
> (1, 2, 3, 4, 5, 6)
type(t)
> tuple

You may notice that all multivariate functions receive an input tuple in their function call. Many functions with multiple outputs produce output in a tuple. For example, divmod(a, b) accepts the tuple of integers (a,b), and returns the tuple (q, r) for the quotient q and remainder r of a divided by b.

divmod(28, 5)
> (5, 3)
q, r = divmod(28, 5)
print(q)
> 5
print(r)
> 3

Notice in the above example we used the variables q and r to “catch” the values of the tuple that were output from divmod. If we used only a single value in the assignment, then the assigned variable would be a tuple.

Tuple elements can be addressed similar to lists

t = 1, 2, 3, 4, 5, 6
t[2]
> 3
t[2:5]
> (3, 4, 5)

Tuples are immutable; values within a tuple cannot be changed once the tuple is created.

t = 1, 2, 3, 4, 5, 6
t[2] = 7
> TypeError: 'tuple' object does not support item assignment

Python Data Structures Exercises

  1. What data structures are the following lines?
x = 1, 2, 3, 4, 5, 6
x = [1, 2, 3, 4, 5, 6]
x = {1, 2, 3, 4, 5, 6}
x = {"a":1, "b":2, "c":3, "d":4, "e":5, "f":6}

    Solution
2. Is the following a valid dictionary?

x = {1, "b":2, "c":3}

    Solution
3. Create a new nested list that orders the following stacked list, using only addresses to the original list.

u = [["d", "g", "c"], 
     ["b", "i", "e"], 
     ["h", "f", "a"]]

    Solution
4. Given the following list:

x = ["It", "blueberry", "the", "best", "."]

Change the list into the following one, using only the del, x.insert, and x.append functions.

x = ["It", "was", "the", "best", "of", "times."]

    Solution
5. Based on the geography and social studies example from above, write a single line that returns True if Marie is in both classes, or False otherwise.     Solution


Get Our Python Developer Kit for Free

I put together a Python Developer Kit with over 100 pre-built Python scripts covering data structures, Pandas, NumPy, Seaborn, machine learning, file processing, web scraping and a whole lot more - and I want you to have it for free. Enter your email address below and I'll send a copy your way.

Yes, I'll take a free Python Developer Kit

Solutions

1.

> tuple
> list
> set
> dict

2. No. Each dictionary element must have a key:value pair.

3.

v = [[u[2][2], u[1][0], u[0][2]], 
     [u[0][0], u[1][2], u[2][1]], 
     [u[0][1], u[2][0], u[1][1]]]
v
> [['a', 'b', 'c'], ['d', 'e', 'f'], ['g', 'h', 'i']]

4. There are multiple ways to edit the line:

x = ["It", "blueberry", "the", "best", "."]
del(x[1])
del(x[-1])
x.insert(1, "was")
x.append("of")
x.append("times.")
x
> ["It", "was", "the", "best", "of", "times."]

5.

geography = {"Mary", "Matthew", "Marie", "Manuel", "Malala"}
social_studies = {"Novak", "Nathan", "Marie", "Nora", "Malala"}
"Marie" in (geography & social_studies)
> True

Did you find this free tutorial helpful? Share this article with your friends, classmates, and coworkers on Facebook and Twitter! When you spread the word on social media, you’re helping us grow so we can continue to provide free tutorials like this one for years to come.

  1. The Python package numpy does implement arrays, which are used for technical purposes. Lists are interchangeable with arrays for most Python applications. The numpy package and its array features will be discussed in a later tutorial. 

  2. The del(object) also generally applies to any Python object.