The Python Pandas read_excel function lets you import data from Excel files into a Pandas DataFrame. This tutorial includes several Pandas read_excel examples to teach you how to properly use the function to read Excel files in Python Pandas. It’s closely related to our article about the Pandas read_csv function, but this tutorial focuses exclusively on reading Excel files.
- Importing Excel Files into a Pandas DataFrame
- Pandas read_excel Function
- Pandas read_excel Examples
Importing Excel Files into a Pandas DataFrame
Data from an Excel formatted file can be retrieved using the Pandas
from_excel function. This is a really neat way to read Excel files in Python, so let’s walk through a couple examples. I especially like it because Pandas serves as the bridge between my VBA Tutorials and Python Tutorials.
Suppose we have the sample Excel file
We can quickly import this spreadsheet data into Python using the following code:
import pandas as pd df = pd.read_excel('Sample.xlsx', sheet_name='Sheet1') print(df) > Row Name Column A Column B Column C > 0 Row 1 1 2 3 > 1 Row 2 4 5 6 > 2 Row 3 7 8 9
The above code easily read and imported the Excel file into a Pandas DataFrame, but it did so using several default settings that we should consider changing.
For example, the Row Name column values were imported as data, while we may want them to be read as index labels. Chances are each row in your spreadsheet isn’t going to be labeled like in our example, but it’s still something to keep in mind.
The default read_excel parameters also assumed that the first row was a list of column names, which it incorporated automatically as column labels within the DataFrame.
Fortunately, if we want to change how
read_excel reads in these files, we have a several argument options we can play around with.
Note: If your Excel file isn’t stored in the same location as your Python script, you would replace
Pandas read_excel Function
read_excel function contains numerous options for importing data. We’re going to address the essential options for converting Excel files to Pandas DataFrames. These options are extremely common and really important to understand since they can drastically change how the read_excel function behaves.
Here are a few the options you’ll want to understand when working with the read_excel function in Pandas. These options follow the format
read_excel(io, [Other Options])
Important Pandas read_excel Options
|io||A string containing the pathname of the given Excel file.|
|sheet_name||The Excel sheet name, or sheet number, of the data you want to import. The sheet number can be an integer where 0 is the first sheet, 1 is the second, etc. If a list of sheet names/numbers are given, then the output will be a dictionary of DataFrames. The default is to read all the sheets and output a dictionary of DataFrames.|
|header||Row number to use for the list of column labels. The default is 0, indicating that the first row is assumed to contain the column labels. If the data does not have a row of column labels,
|names||A separate Python list input of column names. This option is
|index_col||Specifies which column should be used for row indices. The default option is
|usecols||An integer, list of integers, or string that specifies the columns to be imported into the DataFrame. The default is to import all columns. If a string is given, then Pandas uses the standard Excel format to select columns (e.g. "A:C,F,G" will import columns A, B, C, F, and G).|
|skiprows||The number of rows to skip at the top of the Excel sheet. Default is 0. This option is useful for skipping rows in Excel that contain explanatory information about the data below it.|
Pandas read_excel Examples
Naming Rows in your Excel DataFrame
Suppose in our earlier example we wanted to use the “Row Name” column as a series of row indices (instead of it defaulting to 0, 1, 2, etc.). We can use the read_excel
index_col option to automatically set the row index labels:
import pandas as pd df = pd.read_excel('Sample.xlsx', sheet_name='Sheet1', index_col=0) print(df) > Column A Column B Column C > Row Name > Row 1 1 2 3 > Row 2 4 5 6 > Row 3 7 8 9
Enter your email address for more free Python tutorials and tips.
Naming Columns in your Excel DataFrame
Alternatively, you can use the
names option to specify your own set of column names. This way, you don’t have to rely on the column names specified in your spreadsheet.
import pandas as pd colNames = ["Better A", "Better B", "Better C"] df = pd.read_excel('Sample.xlsx', sheet_name='Sheet1', names=colNames, index_col=0) print(df) > Better A Better B Better C > Row Name > Row 1 1 2 3 > Row 2 4 5 6 > Row 3 7 8 9
Skipping rows and headers
We can use the
skiprows option to tell Pandas not to use the column labels already in our spreadsheet. Note that both
skiprows=1 must be set in this case, as the first row of actual data will be skipped if the
header option is missing.
import pandas as pd df = pd.read_excel('Sample.xlsx', sheet_name='Sheet1', header=None, skiprows=1, index_col=0) print(df) > 1 2 3 > 0 > Row 1 1 2 3 > Row 2 4 5 6 > Row 3 7 8 9
Specifying which columns to import
Finally, if we want to eliminate the index label column entirely and only import certain columns of data, we can tell the
usecols option to only read certain columns from our Excel spreadsheet. In this example, we tell Pandas to only import the second and fourth columns (
import pandas as pd df = pd.read_excel('Sample.xlsx', sheet_name='Sheet1', header=None, skiprows=1, usecols="B,D") print(df) > 0 1 > 0 1 3 > 1 4 6 > 2 7 9
How to import multiple sheets into Pandas
sheet_name is specified in our call to the read_excel function, all your sheets will be imported and the output will be a dictionary of DataFrames. The data in each sheet can be referenced by calling the corresponding “key,” which is just the sheet name or sheet number (starting at 0).
If you only want to import a couple sheets, rather than all the sheets, you have a couple options. Your first option is to just omit the
sheet_name argument and work with the output dictionary to grab the sheet data you want. Again, the “key” in the dictionary is just the sheet name or sheet number and the corresponding value of each key is the DataFrame containing the sheet’s data.
Your second option is to pair the Pandas read_excel function with the ExcelFile function to store the data from each sheet in different DataFrames. You could run the code without the ExcelFile line, but I find it makes the code cleaner since you only have to specify the spreadsheet path once. Here’s an example:
import pandas as pd xls = pd.ExcelFile('Sample.xlsx') df1 = pd.read_excel(xls, sheet_name='Sheet1') df2 = pd.read_excel(xls, sheet_name='Sheet2')
Enter your email address for more free Python tutorials and tips.
I hope you found this Pandas read_excel tutorial helpful. We’ve presented a bunch of examples to show you how to import your Excel spreadsheets into a Pandas DataFrame.
If you found it helpful, share this article with your friends, classmates, and coworkers on Facebook and Twitter! When you spread the word on social media, you’re helping us grow so we can continue to provide free tutorials like this one for years to come.