The Pandas read_csv function lets you import data from CSV and plain-text files into DataFrames. This tutorial provides several Pandas read_csv examples to teach you how the function works and how you can use it to import your own files. If you’re looking for how to read Excel files with Pandas, I encourage you to read our tutorial about the read_excel function.
- Importing CSV Files into a Pandas DataFrame
Importing CSV Files into a Pandas DataFrame
It’s common to need to import CSV files into a Pandas DataFrame. We’re going to look at different ways to import delimited files into DataFrames using the versatile Pandas
Despite the name of the function, the read_csv function actually works to import any document with a delimiter. All we need to do is change the default delimiter within our read_csv function options. We’ll get to that later, but for now let’s assume we have the CSV file
Row Name,Column A,Column B,Column C Row 1,1,2,3 Row 2,4,5,6 Row 3,7,8,9
We can quickly import this data using the Pandas read_csv function with the default options:
import pandas as pd df = pd.read_csv('Sample.csv') print(df) Row Name Column A Column B Column C 0 Row 1 1 2 3 1 Row 2 4 5 6 2 Row 3 7 8 9
The Python example above easily read and imported our CSV file into a Pandas DataFrame, but it didn’t translate our file into the best-looking dataframe. That’s because we relied in the read_csv default settings. We can change these options to modify how the conversion behaves.
For example, the “Row Name” column values were imported as data. The actual index labels were given number values (0, 1, 2). Chances are, we’d prefer our custom row names to be read as these index labels, instead.
The default read_csv settings also assumed the first row was a list of column names, which it incorporated automatically as column labels within the DataFrame. If we want to change how
read_csv reads in these files, we have to start tinkering with its many powerful options.
Before we get into the options, I want to remind you that your csv file doesn’t have to be located in the same directory as your existing Python file. You can specify a full file path or even navigate the file name just like you would navigate directories. For example, if your CSV file were located one directory up from your Python file and in a subfolder called
import pandas as pd df = pd.read_csv('../my_data/Sample.csv')
Pandas read_csv Function
The Pandas read_csv function allows a number of options for importing data. Many of these options, you’ll never touch. In an effort to keep you sane, we’re going to go over the essential options for everyday use.
The read_csv function takes on the form:
read_csv(filepath_or_buffer, [Other Options])
This table explains what the
Important Pandas read_csv Options
|filepath_or_buffer||A string containing the pathname of the given plain text file.|
|sep||The delimiter used to separate columns of data. The default is a comma "," for CSV files. If files are delimited by spaces then " " can be used, or `\t" for tabs.
|delim_whitespace||A boolean value to specify if whitespace should be used as a separator. The default is
|header||Row number to use for the list of column labels. If the data does not have a row of column labels,
|names||A separate Python list input of column names. This option is
|index_col||Specifies which column should be used for row indices. The default option is
|usecols||An integer, list of integers, column name string, or list of column name strings that specify the columns to be imported into the DataFrame. The default is to import all columns.|
|skiprows||The number of rows to skip at the top of the CSV or text file. The default is 0, so all lines are read. This option is useful for skipping rows in a spreadsheet containing explanatory information about the data below it, like comment cards.|
|quotechar||If values with delimiting characters are grouped with a specified character given to quotechar, then those delimiters will be ignored and grouped within the value. This will make more sense when we get down to an example.|
Enter your email address for more free Python tutorials and tips.
Suppose in the above example, we wanted to use the “Row Name” column as a series of row indices instead of the default numeric indices (0, 1, 2). We can use the Pandas read_csv
index_col option to automatically set the row index labels, like in this example:
import pandas as pd df = pd.read_csv('Sample.csv', index_col=0) print(df) > Column A Column B Column C > Row Name > Row 1 1 2 3 > Row 2 4 5 6 > Row 3 7 8 9
You see how when we print our Pandas DataFrame this time, the first column is now taken directly from the first column of our CSV file. We no longer have the ugly 0, 1, and 2 printed before our table is printed.
Similarly, you can use the
names option to use a different set of column names. Note: The
names option overrides the
header option before the header row is read, therefore if the data contains a header, then the
header option will need to be specified explicitly. This functionality is not the same as the
read_excel function, wherein the names overwrite the default header obtained from the data. This is an extremely important difference between the Pandas read_csv function and the read_excel function.
import pandas as pd colNames = ["Better A", "Better B", "Better C"] df = pd.read_csv('Sample.csv', names=colNames, index_col=0, header=0) print(df) > Better A Better B Better C > Row 1 1 2 3 > Row 2 4 5 6 > Row 3 7 8 9
Plain-Text Delimited Files with the read_csv Function
Believe it or not, the Pandas read_csv function can do a lot more than just read CSV files. We’ll take a look at another read_csv example, but this time we want to read a plain text file.
Suppose we have the following file
Sample.txt that is delimited by whitespace (tabs and/or spaces) with the following form:
Row_Name Column_A Column_B Column_C Row_1 1 2 3 Row_2 4 5 6 Row_3 7 8 9
Notice in the above file, each value is separated by any combination of whitespace characters (both tabs and spaces), and all names are without whitespace characters. We can import this file using Pandas
read_csv with the
import pandas as pd df = pd.read_csv('Sample.txt', index_col=0, delim_whitespace=True) print(df) > Column_A Column_B Column_C > Row_Name > Row_1 1 2 3 > Row_2 4 5 6 > Row_3 7 8 9
Note that we used the
delim_whitespace option rather than the
sep=" " method, since the latter would not catch the series of spaces and/or tabs.
If we used sep instead, our dataframe would print out like this ugly example:
import pandas as pd df = pd.read_csv('Sample.txt', index_col=0, sep=" ") # Not correct! print(df) # Expect ugly output > Column_A\t\tColumn_B Unnamed: 2 Column_C > Row_Name\t > Row_1 1 2 3 > Row_2 4 5 6 > Row_3 7 8 9
Generally, if whitespace is used as a delimiter, the
delim_whitespace option should be used before a specific application of the
sep option. This is an important tip to keep in mind when you’re running your own Pandas read_csv examples.
In the above
"Row Name" "Column A" "Column B" "Column C" "Row 1" 1 2 3 "Row 2" 4 5 6 "Row 3" 7 8 9
The quotes allow us to take advantage of the read_csv quotechar option. The read_csv function does not automatically recognize quotes as a grouping device, so we have to include the
quotechar option. The
quotechar option takes a string, and if we wish to include a literal double quote character (“) then we will have to escape it with a backslash in the form
import pandas as pd df = pd.read_csv('Sample.txt', index_col=0, delim_whitespace=True, quotechar="\"") print(df) > Column A Column B Column C > Row Name > Row 1 1 2 3 > Row 2 4 5 6 > Row 3 7 8 9
That means that any instance of a delimiter within quotes will be excluded. Our column and row names won’t be split by the read_csv function if we include a space in the labels!
Enter your email address for more free Python tutorials and tips.
Did you find this free tutorial helpful? Share this article with your friends, classmates, and coworkers on Facebook and Twitter! When you spread the word on social media, you’re helping us grow so we can continue to provide free tutorials like this one for years to come.