The Pandas read_csv function lets you import data from CSV and plain-text files into DataFrames. This tutorial provides several Pandas read_csv examples to teach you how the function works and how you can use it to import your own files. If you’re looking for how to read Excel files with Pandas, I encourage you to read our tutorial about the read_excel function.

Importing CSV Files into a Pandas DataFrame

It’s common to need to import CSV files into a Pandas DataFrame. We’re going to look at different ways to import delimited files into DataFrames using the versatile Pandas read_csv function.

Despite the name of the function, the read_csv function actually works to import any document with a delimiter. All we need to do is change the default delimiter within our read_csv function options. We’ll get to that later, but for now let’s assume we have the CSV file Sample.csv in our working directory. Our CSV file contains the following text:

Row Name,Column A,Column B,Column C
Row 1,1,2,3
Row 2,4,5,6
Row 3,7,8,9

We can quickly import this data using the Pandas read_csv function with the default options:

import pandas as pd
df = pd.read_csv('Sample.csv')
  Row Name  Column A  Column B   Column C
0    Row 1         1          2         3
1    Row 2         4          5         6
2    Row 3         7          8         9

The Python example above easily read and imported our CSV file into a Pandas DataFrame, but it didn’t translate our file into the best-looking dataframe. That’s because we relied in the read_csv default settings. We can change these options to modify how the conversion behaves.

For example, the “Row Name” column values were imported as data. The actual index labels were given number values (0, 1, 2). Chances are, we’d prefer our custom row names to be read as these index labels, instead.

The default read_csv settings also assumed the first row was a list of column names, which it incorporated automatically as column labels within the DataFrame. If we want to change how read_csv reads in these files, we have to start tinkering with its many powerful options.

Before we get into the options, I want to remind you that your csv file doesn’t have to be located in the same directory as your existing Python file. You can specify a full file path or even navigate the file name just like you would navigate directories. For example, if your CSV file were located one directory up from your Python file and in a subfolder called my_data, your Python script would look like this:

import pandas as pd
df = pd.read_csv('../my_data/Sample.csv')

Pandas read_csv Function

The Pandas read_csv function allows a number of options for importing data. Many of these options, you’ll never touch. In an effort to keep you sane, we’re going to go over the essential options for everyday use.

The read_csv function takes on the form:

read_csv(filepath_or_buffer, [Other Options])

This table explains what the filepath_or_buffer argument is and it lists a bunch of the other important read_csv options you need to know.

Important Pandas read_csv Options

Argument Description
filepath_or_buffer A string containing the pathname of the given plain text file.
sep The delimiter used to separate columns of data. The default is a comma "," for CSV files. If files are delimited by spaces then " " can be used, or `\t" for tabs.
delim_whitespace A boolean value to specify if whitespace should be used as a separator. The default is False in which case the read_csv sep delimiter takes precedence. If set to True, any amount of whitespace (spaces or tabs) will be considered a new delimiter. This differs from setting sep=" " or sep="\t", as those consider only a single instance to be a delimiter, whereas `delim_whitespace` will take into account any number of whitespace characters (i.e. 1 space is treated the same as 3 spaces). If using whitespace as a delimiter, it is often preferable to use this option over the sep option.
header Row number to use for the list of column labels. If the data does not have a row of column labels, None should be specified. The default is 0, unless column names are passed with the names option, in which case it defaults to None.
names A separate Python list input of column names. This option is None by default. This option is the equivalent of assigning a list of column names to the columns attribute of the output DataFrame, like we discussed in our Pandas DataFrame from Dictionary tutorial.
index_col Specifies which column should be used for row indices. The default option is None, meaning all columns are included in the data and a range of numbers is used as the row indices.
usecols An integer, list of integers, column name string, or list of column name strings that specify the columns to be imported into the DataFrame. The default is to import all columns.
skiprows The number of rows to skip at the top of the CSV or text file. The default is 0, so all lines are read. This option is useful for skipping rows in a spreadsheet containing explanatory information about the data below it, like comment cards.
quotechar If values with delimiting characters are grouped with a specified character given to quotechar, then those delimiters will be ignored and grouped within the value. This will make more sense when we get down to an example.

Get Our Python Developer Kit for Free

I put together a Python Developer Kit with over 100 pre-built Python scripts covering data structures, Pandas, NumPy, Seaborn, machine learning, file processing, web scraping and a whole lot more - and I want you to have it for free. Enter your email address below and I'll send a copy your way.

Yes, I'll take a free Python Developer Kit

Suppose in the above example, we wanted to use the “Row Name” column as a series of row indices instead of the default numeric indices (0, 1, 2). We can use the Pandas read_csv index_col option to automatically set the row index labels, like in this example:

import pandas as pd
df = pd.read_csv('Sample.csv',  index_col=0)
>           Column A  Column B   Column C
> Row Name                               
> Row 1            1          2         3
> Row 2            4          5         6
> Row 3            7          8         9

You see how when we print our Pandas DataFrame this time, the first column is now taken directly from the first column of our CSV file. We no longer have the ugly 0, 1, and 2 printed before our table is printed.

Similarly, you can use the names option to use a different set of column names. Note: The names option overrides the header option before the header row is read, therefore if the data contains a header, then the header option will need to be specified explicitly. This functionality is not the same as the read_excel function, wherein the names overwrite the default header obtained from the data. This is an extremely important difference between the Pandas read_csv function and the read_excel function.

import pandas as pd
colNames = ["Better A", "Better B", "Better C"]
df = pd.read_csv('Sample.csv', names=colNames, index_col=0, header=0)
>        Better A  Better B  Better C
> Row 1         1         2         3
> Row 2         4         5         6
> Row 3         7         8         9

Plain-Text Delimited Files with the read_csv Function

Believe it or not, the Pandas read_csv function can do a lot more than just read CSV files. We’ll take a look at another read_csv example, but this time we want to read a plain text file.

Suppose we have the following file Sample.txt that is delimited by whitespace (tabs and/or spaces) with the following form:

Row_Name	 Column_A		Column_B  Column_C
Row_1 1 2 3
Row_2 4 5 6
Row_3 7 8 9

Notice in the above file, each value is separated by any combination of whitespace characters (both tabs and spaces), and all names are without whitespace characters. We can import this file using Pandas read_csv with the delim_whitespace option.

import pandas as pd
df = pd.read_csv('Sample.txt', index_col=0, delim_whitespace=True)
>           Column_A  Column_B  Column_C
> Row_Name                              
> Row_1            1         2         3
> Row_2            4         5         6
> Row_3            7         8         9

Note that we used the delim_whitespace option rather than the sep=" " method, since the latter would not catch the series of spaces and/or tabs.

If we used sep instead, our dataframe would print out like this ugly example:

import pandas as pd
df = pd.read_csv('Sample.txt', index_col=0, sep=" ")  # Not correct!
print(df)  # Expect ugly output
>             Column_A\t\tColumn_B  Unnamed: 2  Column_C
> Row_Name\t                                            
> Row_1                          1           2         3
> Row_2                          4           5         6
> Row_3                          7           8         9

Generally, if whitespace is used as a delimiter, the delim_whitespace option should be used before a specific application of the sep option. This is an important tip to keep in mind when you’re running your own Pandas read_csv examples.

In the above Sample.txt file, we had to include underscores in the names to avoid the spaces in the names being used as delimiters. The output names are readable, but if we want a cleaner format we can use quotes to group characters into a single value. Lets assume we’ve done this to the file Sample.txt so that it looks like the following:

"Row Name"	 "Column A"		"Column B"  "Column C"
"Row 1" 1 2 3
"Row 2" 4 5 6
"Row 3" 7 8 9

The quotes allow us to take advantage of the read_csv quotechar option. The read_csv function does not automatically recognize quotes as a grouping device, so we have to include the quotechar option. The quotechar option takes a string, and if we wish to include a literal double quote character (“) then we will have to escape it with a backslash in the form quotechar="\""

import pandas as pd
df = pd.read_csv('Sample.txt', index_col=0, delim_whitespace=True, quotechar="\"") 
>           Column A  Column B  Column C
> Row Name                              
> Row 1            1         2         3
> Row 2            4         5         6
> Row 3            7         8         9

That means that any instance of a delimiter within quotes will be excluded. Our column and row names won’t be split by the read_csv function if we include a space in the labels!

Get Our Python Developer Kit for Free

I put together a Python Developer Kit with over 100 pre-built Python scripts covering data structures, Pandas, NumPy, Seaborn, machine learning, file processing, web scraping and a whole lot more - and I want you to have it for free. Enter your email address below and I'll send a copy your way.

Yes, I'll take a free Python Developer Kit

Did you find this free tutorial helpful? Share this article with your friends, classmates, and coworkers on Facebook and Twitter! When you spread the word on social media, you’re helping us grow so we can continue to provide free tutorials like this one for years to come.