Introduction | Appending Rows | Joining Columns

Introduction to Merging Multiple DataFrames with Pandas concat

This tutorial describes the creation of new Pandas DataFrames by combining, or merging, multiple separate DataFrames using the Pandas Concat function. We will cover combining DataFrame rows from tables with the same number of columns, and combining DataFrame columns from tables with the same number of rows.

The concat function has the following format:

[Joined DataFrame] = pandas.concat([List of DataFrames], axis=[axis], keys=[keys], ignore_index=[Index Bool], [Other])`

Each of the parameters of the Pandas concat function are described in the table below:

[Joined DataFrame] The final concatenated DataFrame.
[List of DataFrames] A list or dictionary of DataFrames to be joined.
[axis] Axis on which DataFrames are joined.
0 for vertical concatenation (joining rows with common columns). This is the default
1 for horizontal concatenation (joining columns with common rows).
[keys] A list of keys associated with each input DataFrame. The final [Joined DataFrame] will have key labels corresponding to the original separate DataFrames.
[Index Bool] Boolean value indicating whether the input DataFrame indices should be ignored and replaced with automated numbering. Default is False, meaning index values are not ignored.
[Other] Other advanced options are available, but they're rarely used and would just muddy the waters in the tutorial. If you really want to see what they are, the Pandas Manual may be referenced for further information.

To illustrate the use of the concat function, we’ll assume that we have the following three DataFrames available for use:

import pandas as pd # Don't forget to import Pandas!
# df1, df2, and df3 are already defined

df1.head()
>         A   B   C
> Index            
> 1      A1  B1  C1
> 2      A2  B2  C2
> 3      A3  B3  C3
> 4      A4  B4  C4
> 5      A5  B5  C5

df2.head()  # df2 contains rows extending df1
>          A    B    C
> Index               
> 7       A7   B7   C7
> 8       A8   B8   C8
> 9       A9   B9   C9
> 10     A10  B10  C10
> 11     A11  B11  C11

df3.head() # df3 contains columns extending df1
>         D   E   F
> Index            
> 1      D1  E1  F1
> 2      D2  E2  F2
> 3      D3  E3  F3
> 4      D4  E4  F4
> 5      D5  E5  F5

Now let’s use the Pandas concat function to merge these DataFrames!


Appending Rows with Pandas concat

Let’s look at appending DataFrame rows. To append rows, the concat option axis is set to 0. Because 0 is the default setting, we can omit the option entirely and create a new merged DataFrame vdf:

vdf = pd.concat([df1, df2])
print(vdf)
>          A    B    C
> Index               
> 1       A1   B1   C1
> 2       A2   B2   C2
> 3       A3   B3   C3
> 4       A4   B4   C4
> 5       A5   B5   C5
> 6       A6   B6   C6
> 7       A7   B7   C7
> 8       A8   B8   C8
> 9       A9   B9   C9
> 10     A10  B10  C10
> 11     A11  B11  C11
> 12     A12  B12  C12

Get Our Python Developer Kit for Free

I put together a Python Developer Kit with over 100 pre-built Python scripts covering data structures, Pandas, NumPy, Seaborn, machine learning, file processing, web scraping and a whole lot more - and I want you to have it for free. Enter your email address below and I'll send a copy your way.

Yes, I'll take a free Python Developer Kit

If we wish to track the original DataFrames from which vdf was created, we can use the keys option as follows:

vdf = pd.concat([df1, df2], keys=["df1", "df2"])
print(vdf)
>              A    B    C
>     Index               
> df1 1       A1   B1   C1
>     2       A2   B2   C2
>     3       A3   B3   C3
>     4       A4   B4   C4
>     5       A5   B5   C5
>     6       A6   B6   C6
> df2 7       A7   B7   C7
>     8       A8   B8   C8
>     9       A9   B9   C9
>     10     A10  B10  C10
>     11     A11  B11  C11
>     12     A12  B12  C12

You can see the labels of the original Pandas DataFrames are clearly presented at the start of each table.

Note that indices are automatically imported from the source DataFrames, meaning that duplicate indices can occur. For example, what would happen if we used the Pandas concat function to merge two identical dataframes?

vdf = pd.concat([df1, df1]) # Suppose we join df1 to itself
print(vdf)
>         A   B   C
> Index            
> 1      A1  B1  C1
> 2      A2  B2  C2
> 3      A3  B3  C3
> 4      A4  B4  C4
> 5      A5  B5  C5
> 6      A6  B6  C6
> 1      A1  B1  C1
> 2      A2  B2  C2
> 3      A3  B3  C3
> 4      A4  B4  C4
> 5      A5  B5  C5
> 6      A6  B6  C6

In the resulting DataFrame above, we have duplicate index values. Each value is repeated twice since we merged the same DataFrame twice.

In tables containing separate information, we may want to renumber the resulting indices to avoid duplicate entries. We can use the ignore_index flag to renumber the output indices in the final DataFrame. Let’s try to merge the df1 DataFrame with itself again, but this time we’ll ignore the original index values and create our own.

vdf = pd.concat([df1, df1], ignore_index=True)
print(vdf)
>      A   B   C
> 0   A1  B1  C1
> 1   A2  B2  C2
> 2   A3  B3  C3
> 3   A4  B4  C4
> 4   A5  B5  C5
> 5   A6  B6  C6
> 6   A1  B1  C1
> 7   A2  B2  C2
> 8   A3  B3  C3
> 9   A4  B4  C4
> 10  A5  B5  C5
> 11  A6  B6  C6

See how the index values now start from 0 and go to 11? There are no duplicates, and this example also serves as a reminder that indexing starts at 0.


Joining Columns with Pandas concat

Now that we’ve learned how to stack two DataFrames on top of each other, let’s look at appending DataFrame columns to create a merged DataFrame (side by side). To append columns, the concat option axis is set to 1. Let’s look at appending DataFrames df3 to df1.

hdf = pd.concat([df1, df3], axis=1)
print(hdf)
>         A   B   C   D   E   F
> Index                        
> 1      A1  B1  C1  D1  E1  F1
> 2      A2  B2  C2  D2  E2  F2
> 3      A3  B3  C3  D3  E3  F3
> 4      A4  B4  C4  D4  E4  F4
> 5      A5  B5  C5  D5  E5  F5
> 6      A6  B6  C6  D6  E6  F6

An important note to make about the concat function is that by default joins are performed using an “outer” join, which will keep all unmatched values and insert a null value in each position where a value is missing.

For example, if we join df2 with df3, where both tables are missing common indices and columns, then we will get a set of null values where the indices and columns don’t overlap:

hdf = pd.concat([df3, df2])
print(hdf)
>          A    B    C    D    E    F
> Index                              
> 1      NaN  NaN  NaN   D1   E1   F1
> 2      NaN  NaN  NaN   D2   E2   F2
> 3      NaN  NaN  NaN   D3   E3   F3
> 4      NaN  NaN  NaN   D4   E4   F4
> 5      NaN  NaN  NaN   D5   E5   F5
> 6      NaN  NaN  NaN   D6   E6   F6
> 7       A7   B7   C7  NaN  NaN  NaN
> 8       A8   B8   C8  NaN  NaN  NaN
> 9       A9   B9   C9  NaN  NaN  NaN
> 10     A10  B10  C10  NaN  NaN  NaN
> 11     A11  B11  C11  NaN  NaN  NaN
> 12     A12  B12  C12  NaN  NaN  NaN

This behavior can be altered by changing the join option. Subscribe to our Python tutorials below because we will be discussing more about join methods in future tutorials on relational tables.


Get Our Python Developer Kit for Free

I put together a Python Developer Kit with over 100 pre-built Python scripts covering data structures, Pandas, NumPy, Seaborn, machine learning, file processing, web scraping and a whole lot more - and I want you to have it for free. Enter your email address below and I'll send a copy your way.

Yes, I'll take a free Python Developer Kit

Did you find this free tutorial helpful? Share this article with your friends, classmates, and coworkers on Facebook and Twitter! When you spread the word on social media, you’re helping us grow so we can continue to provide free tutorials like this one for years to come.