Python Pandas read_excel to Import Excel File Into DataFrame

The Python Pandas read_excel function lets you import data from Excel files into a Pandas DataFrame. This tutorial includes several Pandas read_excel examples to teach you how to properly use the function to read Excel files in Python Pandas. It’s closely related to our article about the Pandas read_csv function, but this tutorial focuses exclusively on reading Excel files.

Importing Excel Files into a Pandas DataFrame
Pandas read_excel Function
- Important Pandas read_excel Options
Pandas read_excel Examples

Importing Excel Files into a Pandas DataFrame

Data from an Excel formatted file can be retrieved using the Pandas from_excel function. This is a really neat way to read Excel files in Python, so let’s walk through a couple examples. I especially like it because Pandas serves as the bridge between my VBA Tutorials and Python Tutorials.

Suppose we have the sample Excel file Sample.xlsx stored in the same location as our Python script. The sample Excel file contains the following data within sheet Sheet1:

Sample Excel File

We can quickly import this spreadsheet data into Python using the following code:

import pandas as pd
df = pd.read_excel('Sample.xlsx', sheet_name='Sheet1')
print(df)
>   Row Name  Column A  Column B   Column C
> 0    Row 1         1          2         3
> 1    Row 2         4          5         6
> 2    Row 3         7          8         9

The above code easily read and imported the Excel file into a Pandas DataFrame, but it did so using several default settings that we should consider changing.

For example, the Row Name column values were imported as data, while we may want them to be read as index labels. Chances are each row in your spreadsheet isn’t going to be labeled like in our example, but it’s still something to keep in mind.

The default read_excel parameters also assumed that the first row was a list of column names, which it incorporated automatically as column labels within the DataFrame.

Fortunately, if we want to change how read_excel reads in these files, we have a several argument options we can play around with.

Note: If your Excel file isn’t stored in the same location as your Python script, you would replace Sample.xlsx with the full path to your spreadsheet.

Pandas read_excel Function

The Pandas read_excel function contains numerous options for importing data. We’re going to address the essential options for converting Excel files to Pandas DataFrames. These options are extremely common and really important to understand since they can drastically change how the read_excel function behaves.

Here are a few the options you’ll want to understand when working with the read_excel function in Pandas. These options follow the format

read_excel(io, [Other Options])

Important Pandas read_excel Options

Argument	Description
io	A string containing the pathname of the given Excel file.
sheet_name	The Excel sheet name, or sheet number, of the data you want to import. The sheet number can be an integer where 0 is the first sheet, 1 is the second, etc. If a list of sheet names/numbers are given, then the output will be a dictionary of DataFrames. The default is to read all the sheets and output a dictionary of DataFrames.
header	Row number to use for the list of column labels. The default is 0, indicating that the first row is assumed to contain the column labels. If the data does not have a row of column labels, None should be used.
names	A separate Python list input of column names. This option is None by default. This option is the equivalent of assigning a list of column names to the columns attribute of the output DataFrame.
index_col	Specifies which column should be used for row indices. The default option is None, meaning that all columns are included in the data, and a range of numbers is used as the row indices.
usecols	An integer, list of integers, or string that specifies the columns to be imported into the DataFrame. The default is to import all columns. If a string is given, then Pandas uses the standard Excel format to select columns (e.g. "A:C,F,G" will import columns A, B, C, F, and G).
skiprows	The number of rows to skip at the top of the Excel sheet. Default is 0. This option is useful for skipping rows in Excel that contain explanatory information about the data below it.

Pandas read_excel Examples

Naming Rows in your Excel DataFrame

Suppose in our earlier example we wanted to use the “Row Name” column as a series of row indices (instead of it defaulting to 0, 1, 2, etc.). We can use the read_excel index_col option to automatically set the row index labels:

import pandas as pd
df = pd.read_excel('Sample.xlsx', sheet_name='Sheet1', index_col=0)
print(df)
>           Column A  Column B   Column C
> Row Name                               
> Row 1            1          2         3
> Row 2            4          5         6
> Row 3            7          8         9

Code More, Distract Less: Support Our Ad-Free Site

You might have noticed we removed ads from our site - we hope this enhances your learning experience. To help sustain this, please take a look at our Python Developer Kit and our comprehensive cheat sheets. Each purchase directly supports this site, ensuring we can continue to offer you quality, distraction-free tutorials.

Get Our Python Dev Kit

Naming Columns in your Excel DataFrame

Alternatively, you can use the names option to specify your own set of column names. This way, you don’t have to rely on the column names specified in your spreadsheet.

import pandas as pd
colNames = ["Better A", "Better B", "Better C"]
df = pd.read_excel('Sample.xlsx', sheet_name='Sheet1', names=colNames, index_col=0)
print(df)
>           Better A  Better B  Better C
> Row Name                              
> Row 1            1         2         3
> Row 2            4         5         6
> Row 3            7         8         9

Skipping rows and headers

We can use the header and skiprows option to tell Pandas not to use the column labels already in our spreadsheet. Note that both header=None and skiprows=1 must be set in this case, as the first row of actual data will be skipped if the header option is missing.

import pandas as pd
df = pd.read_excel('Sample.xlsx', sheet_name='Sheet1', header=None, skiprows=1, index_col=0)
print(df)
>        1  2  3
> 0             
> Row 1  1  2  3
> Row 2  4  5  6
> Row 3  7  8  9

Specifying which columns to import

Finally, if we want to eliminate the index label column entirely and only import certain columns of data, we can tell the usecols option to only read certain columns from our Excel spreadsheet. In this example, we tell Pandas to only import the second and fourth columns ("B,D" in Excel):

import pandas as pd
df = pd.read_excel('Sample.xlsx', sheet_name='Sheet1', header=None, skiprows=1, usecols="B,D")
print(df)
>    0  1
> 0  1  3
> 1  4  6
> 2  7  9

How to import multiple sheets into Pandas

If no sheet_name is specified in our call to the read_excel function, all your sheets will be imported and the output will be a dictionary of DataFrames. The data in each sheet can be referenced by calling the corresponding “key,” which is just the sheet name or sheet number (starting at 0).

If you only want to import a couple sheets, rather than all the sheets, you have a couple options. Your first option is to just omit the sheet_name argument and work with the output dictionary to grab the sheet data you want. Again, the “key” in the dictionary is just the sheet name or sheet number and the corresponding value of each key is the DataFrame containing the sheet’s data.

Your second option is to pair the Pandas read_excel function with the ExcelFile function to store the data from each sheet in different DataFrames. You could run the code without the ExcelFile line, but I find it makes the code cleaner since you only have to specify the spreadsheet path once. Here’s an example:

import pandas as pd
xls = pd.ExcelFile('Sample.xlsx')
df1 = pd.read_excel(xls, sheet_name='Sheet1')
df2 = pd.read_excel(xls, sheet_name='Sheet2')