The Python Pandas read_excel function lets you import data from Excel files into a Pandas DataFrame. This tutorial includes several Pandas read_excel examples to teach you how to properly use the function to read Excel files in Python Pandas. It’s closely related to our article about the Pandas read_csv function, but this tutorial focuses exclusively on reading Excel files.
Importing Excel Files into a Pandas DataFrame
Data from an Excel formatted file can be retrieved using the Pandas from_excel
function. This is a really neat way to read Excel files in Python, so let’s walk through a couple examples. I especially like it because Pandas serves as the bridge between my VBA Tutorials and Python Tutorials.
Suppose we have the sample Excel file
We can quickly import this spreadsheet data into Python using the following code:
import pandas as pd
df = pd.read_excel('Sample.xlsx', sheet_name='Sheet1')
print(df)
> Row Name Column A Column B Column C
> 0 Row 1 1 2 3
> 1 Row 2 4 5 6
> 2 Row 3 7 8 9
The above code easily read and imported the Excel file into a Pandas DataFrame, but it did so using several default settings that we should consider changing.
For example, the Row Name column values were imported as data, while we may want them to be read as index labels. Chances are each row in your spreadsheet isn’t going to be labeled like in our example, but it’s still something to keep in mind.
The default read_excel parameters also assumed that the first row was a list of column names, which it incorporated automatically as column labels within the DataFrame.
Fortunately, if we want to change how read_excel
reads in these files, we have a several argument options we can play around with.
Note: If your Excel file isn’t stored in the same location as your Python script, you would replace
Pandas read_excel Function
The Pandas read_excel
function contains numerous options for importing data. We’re going to address the essential options for converting Excel files to Pandas DataFrames. These options are extremely common and really important to understand since they can drastically change how the read_excel function behaves.
Here are a few the options you’ll want to understand when working with the read_excel function in Pandas. These options follow the format
read_excel(io, [Other Options])
Important Pandas read_excel Options
Argument | Description |
---|---|
io | A string containing the pathname of the given Excel file. |
sheet_name | The Excel sheet name, or sheet number, of the data you want to import. The sheet number can be an integer where 0 is the first sheet, 1 is the second, etc. If a list of sheet names/numbers are given, then the output will be a dictionary of DataFrames. The default is to read all the sheets and output a dictionary of DataFrames. |
header | Row number to use for the list of column labels. The default is 0, indicating that the first row is assumed to contain the column labels. If the data does not have a row of column labels, |
names | A separate Python list input of column names. This option is |
index_col | Specifies which column should be used for row indices. The default option is |
usecols | An integer, list of integers, or string that specifies the columns to be imported into the DataFrame. The default is to import all columns. If a string is given, then Pandas uses the standard Excel format to select columns (e.g. "A:C,F,G" will import columns A, B, C, F, and G). |
skiprows | The number of rows to skip at the top of the Excel sheet. Default is 0. This option is useful for skipping rows in Excel that contain explanatory information about the data below it. |
Pandas read_excel Examples
Naming Rows in your Excel DataFrame
Suppose in our earlier example we wanted to use the “Row Name” column as a series of row indices (instead of it defaulting to 0, 1, 2, etc.). We can use the read_excel index_col
option to automatically set the row index labels:
import pandas as pd
df = pd.read_excel('Sample.xlsx', sheet_name='Sheet1', index_col=0)
print(df)
> Column A Column B Column C
> Row Name
> Row 1 1 2 3
> Row 2 4 5 6
> Row 3 7 8 9
Code More, Distract Less: Support Our Ad-Free Site
You might have noticed we removed ads from our site - we hope this enhances your learning experience. To help sustain this, please take a look at our Python Developer Kit and our comprehensive cheat sheets. Each purchase directly supports this site, ensuring we can continue to offer you quality, distraction-free tutorials.
Naming Columns in your Excel DataFrame
Alternatively, you can use the names
option to specify your own set of column names. This way, you don’t have to rely on the column names specified in your spreadsheet.
import pandas as pd
colNames = ["Better A", "Better B", "Better C"]
df = pd.read_excel('Sample.xlsx', sheet_name='Sheet1', names=colNames, index_col=0)
print(df)
> Better A Better B Better C
> Row Name
> Row 1 1 2 3
> Row 2 4 5 6
> Row 3 7 8 9
Skipping rows and headers
We can use the header
and skiprows
option to tell Pandas not to use the column labels already in our spreadsheet. Note that both header=None
and skiprows=1
must be set in this case, as the first row of actual data will be skipped if the header
option is missing.
import pandas as pd
df = pd.read_excel('Sample.xlsx', sheet_name='Sheet1', header=None, skiprows=1, index_col=0)
print(df)
> 1 2 3
> 0
> Row 1 1 2 3
> Row 2 4 5 6
> Row 3 7 8 9
Specifying which columns to import
Finally, if we want to eliminate the index label column entirely and only import certain columns of data, we can tell the usecols
option to only read certain columns from our Excel spreadsheet. In this example, we tell Pandas to only import the second and fourth columns (
import pandas as pd
df = pd.read_excel('Sample.xlsx', sheet_name='Sheet1', header=None, skiprows=1, usecols="B,D")
print(df)
> 0 1
> 0 1 3
> 1 4 6
> 2 7 9
How to import multiple sheets into Pandas
If no sheet_name
is specified in our call to the read_excel function, all your sheets will be imported and the output will be a dictionary of DataFrames. The data in each sheet can be referenced by calling the corresponding “key,” which is just the sheet name or sheet number (starting at 0).
If you only want to import a couple sheets, rather than all the sheets, you have a couple options. Your first option is to just omit the sheet_name
argument and work with the output dictionary to grab the sheet data you want. Again, the “key” in the dictionary is just the sheet name or sheet number and the corresponding value of each key is the DataFrame containing the sheet’s data.
Your second option is to pair the Pandas read_excel function with the ExcelFile function to store the data from each sheet in different DataFrames. You could run the code without the ExcelFile line, but I find it makes the code cleaner since you only have to specify the spreadsheet path once. Here’s an example:
import pandas as pd
xls = pd.ExcelFile('Sample.xlsx')
df1 = pd.read_excel(xls, sheet_name='Sheet1')
df2 = pd.read_excel(xls, sheet_name='Sheet2')
Code More, Distract Less: Support Our Ad-Free Site
You might have noticed we removed ads from our site - we hope this enhances your learning experience. To help sustain this, please take a look at our Python Developer Kit and our comprehensive cheat sheets. Each purchase directly supports this site, ensuring we can continue to offer you quality, distraction-free tutorials.
I hope you found this Pandas read_excel tutorial helpful. We’ve presented a bunch of examples to show you how to import your Excel spreadsheets into a Pandas DataFrame.
If you found it helpful, share this article with your friends, classmates, and coworkers on Facebook and Twitter! When you spread the word on social media, you’re helping us grow so we can continue to provide free tutorials like this one for years to come.