The Pandas read_csv function lets you import data from CSV and plain-text files into DataFrames. This tutorial provides several Pandas read_csv examples to teach you how the function works and how you can use it to import your own files. If you’re looking for how to read Excel files with Pandas, I encourage you to read our tutorial about the read_excel function.
Importing CSV Files into a Pandas DataFrame
It’s common to need to import CSV files into a Pandas DataFrame. We’re going to look at different ways to import delimited files into DataFrames using the versatile Pandas read_csv
function.
Despite the name of the function, the read_csv function actually works to import any document with a delimiter. All we need to do is change the default delimiter within our read_csv function options. We’ll get to that later, but for now let’s assume we have the CSV file
Row Name,Column A,Column B,Column C Row 1,1,2,3 Row 2,4,5,6 Row 3,7,8,9
We can quickly import this data using the Pandas read_csv function with the default options:
import pandas as pd
df = pd.read_csv('Sample.csv')
print(df)
Row Name Column A Column B Column C
0 Row 1 1 2 3
1 Row 2 4 5 6
2 Row 3 7 8 9
The Python example above easily read and imported our CSV file into a Pandas DataFrame, but it didn’t translate our file into the best-looking dataframe. That’s because we relied in the read_csv default settings. We can change these options to modify how the conversion behaves.
For example, the “Row Name” column values were imported as data. The actual index labels were given number values (0, 1, 2). Chances are, we’d prefer our custom row names to be read as these index labels, instead.
The default read_csv settings also assumed the first row was a list of column names, which it incorporated automatically as column labels within the DataFrame. If we want to change how read_csv
reads in these files, we have to start tinkering with its many powerful options.
Before we get into the options, I want to remind you that your csv file doesn’t have to be located in the same directory as your existing Python file. You can specify a full file path or even navigate the file name just like you would navigate directories. For example, if your CSV file were located one directory up from your Python file and in a subfolder called
import pandas as pd
df = pd.read_csv('../my_data/Sample.csv')
Pandas read_csv Function
The Pandas read_csv function allows a number of options for importing data. Many of these options, you’ll never touch. In an effort to keep you sane, we’re going to go over the essential options for everyday use.
The read_csv function takes on the form:
read_csv(filepath_or_buffer, [Other Options])
This table explains what the
Important Pandas read_csv Options
Argument | Description |
---|---|
filepath_or_buffer | A string containing the pathname of the given plain text file. |
sep | The delimiter used to separate columns of data. The default is a comma "," for CSV files. If files are delimited by spaces then " " can be used, or `\t" for tabs. |
delim_whitespace | A boolean value to specify if whitespace should be used as a separator. The default is sep=" " or sep="\t" , as those consider only a single instance to be a delimiter, whereas `delim_whitespace` will take into account any number of whitespace characters (i.e. 1 space is treated the same as 3 spaces). If using whitespace as a delimiter, it is often preferable to use this option over the sep option. |
header | Row number to use for the list of column labels. If the data does not have a row of column labels, |
names | A separate Python list input of column names. This option is |
index_col | Specifies which column should be used for row indices. The default option is |
usecols | An integer, list of integers, column name string, or list of column name strings that specify the columns to be imported into the DataFrame. The default is to import all columns. |
skiprows | The number of rows to skip at the top of the CSV or text file. The default is 0, so all lines are read. This option is useful for skipping rows in a spreadsheet containing explanatory information about the data below it, like comment cards. |
quotechar | If values with delimiting characters are grouped with a specified character given to quotechar, then those delimiters will be ignored and grouped within the value. This will make more sense when we get down to an example. |
Get Our Python Developer Kit for Free
I put together a Python Developer Kit with over 100 pre-built Python scripts covering data structures, Pandas, NumPy, Seaborn, machine learning, file processing, web scraping and a whole lot more - and I want you to have it for free. Enter your email address below and I'll send a copy your way.
Suppose in the above example, we wanted to use the “Row Name” column as a series of row indices instead of the default numeric indices (0, 1, 2). We can use the Pandas read_csv index_col
option to automatically set the row index labels, like in this example:
import pandas as pd
df = pd.read_csv('Sample.csv', index_col=0)
print(df)
> Column A Column B Column C
> Row Name
> Row 1 1 2 3
> Row 2 4 5 6
> Row 3 7 8 9
You see how when we print our Pandas DataFrame this time, the first column is now taken directly from the first column of our CSV file. We no longer have the ugly 0, 1, and 2 printed before our table is printed.
Similarly, you can use the names
option to use a different set of column names. Note: The names
option overrides the header
option before the header row is read, therefore if the data contains a header, then the header
option will need to be specified explicitly. This functionality is not the same as the read_excel
function, wherein the names overwrite the default header obtained from the data. This is an extremely important difference between the Pandas read_csv function and the read_excel function.
import pandas as pd
colNames = ["Better A", "Better B", "Better C"]
df = pd.read_csv('Sample.csv', names=colNames, index_col=0, header=0)
print(df)
> Better A Better B Better C
> Row 1 1 2 3
> Row 2 4 5 6
> Row 3 7 8 9
Plain-Text Delimited Files with the read_csv Function
Believe it or not, the Pandas read_csv function can do a lot more than just read CSV files. We’ll take a look at another read_csv example, but this time we want to read a plain text file.
Suppose we have the following file Sample.txt
that is delimited by whitespace (tabs and/or spaces) with the following form:
Row_Name Column_A Column_B Column_C Row_1 1 2 3 Row_2 4 5 6 Row_3 7 8 9
Notice in the above file, each value is separated by any combination of whitespace characters (both tabs and spaces), and all names are without whitespace characters. We can import this file using Pandas read_csv
with the delim_whitespace
option.
import pandas as pd
df = pd.read_csv('Sample.txt', index_col=0, delim_whitespace=True)
print(df)
> Column_A Column_B Column_C
> Row_Name
> Row_1 1 2 3
> Row_2 4 5 6
> Row_3 7 8 9
Note that we used the delim_whitespace
option rather than the sep=" "
method, since the latter would not catch the series of spaces and/or tabs.
If we used sep instead, our dataframe would print out like this ugly example:
import pandas as pd
df = pd.read_csv('Sample.txt', index_col=0, sep=" ") # Not correct!
print(df) # Expect ugly output
> Column_A\t\tColumn_B Unnamed: 2 Column_C
> Row_Name\t
> Row_1 1 2 3
> Row_2 4 5 6
> Row_3 7 8 9
Generally, if whitespace is used as a delimiter, the delim_whitespace
option should be used before a specific application of the sep
option. This is an important tip to keep in mind when you’re running your own Pandas read_csv examples.
In the above
"Row Name" "Column A" "Column B" "Column C" "Row 1" 1 2 3 "Row 2" 4 5 6 "Row 3" 7 8 9
The quotes allow us to take advantage of the read_csv quotechar option. The read_csv function does not automatically recognize quotes as a grouping device, so we have to include the quotechar
option. The quotechar
option takes a string, and if we wish to include a literal double quote character (“) then we will have to escape it with a backslash in the form quotechar="\""
import pandas as pd
df = pd.read_csv('Sample.txt', index_col=0, delim_whitespace=True, quotechar="\"")
print(df)
> Column A Column B Column C
> Row Name
> Row 1 1 2 3
> Row 2 4 5 6
> Row 3 7 8 9
That means that any instance of a delimiter within quotes will be excluded. Our column and row names won’t be split by the read_csv function if we include a space in the labels!
Get Our Python Developer Kit for Free
I put together a Python Developer Kit with over 100 pre-built Python scripts covering data structures, Pandas, NumPy, Seaborn, machine learning, file processing, web scraping and a whole lot more - and I want you to have it for free. Enter your email address below and I'll send a copy your way.
Did you find this free tutorial helpful? Share this article with your friends, classmates, and coworkers on Facebook and Twitter! When you spread the word on social media, you’re helping us grow so we can continue to provide free tutorials like this one for years to come.