Introduction to Importing Python Data to Pandas DataFrames

This tutorial will cover the basics of importing data from the internal Python lists and dictionaries into Pandas DataFrame structures. We will provide additional tutorials for importing data from external sources such as Microsoft Excel in later tutorials. The Pandas DataFrame structure provides a suite of tools for the manipulation and inspection of data. These DataFrames provide a powerful backbone to any data science project, and knowing how to create them from existing data is crucial.

Remember that any use of the Pandas module requires both installation of Pandas and Numpy 1 into the execution environment. Once the package is installed, all modules using the Pandas module will require the import statement import pandas at the start of the module.

This tutorial will show you how to convert dictionaries and lists to Pandas DataFrames. We’ll also explain how to create a Pandas DataFrame from a list of dicts and a list of lists. However your Python data is structured, this tutorial will show you how to convert them into a Pandas DataFrame.


Convert Dictionary to Pandas DataFrame

Pandas has a builtin method of converting a dictionary whose keys are column labels, and whose values are a list of table entries, into a DataFrame. In other words, Pandas will convert the following dictionary

dictOne = {"Column A":[1, 2, 3],
		   "Column B":[4, 5, 6],
		   "Column C":[7, 8, 9]}

into the following Pandas DataFrame:

import pandas as pd
pd.DataFrame(dictOne)
>    Column A  Column B  Column C
> 0         1         4         7
> 1         2         5         8
> 2         3         6         9

As you can see from the above example, conversion from dictionary to DataFrame was as simple as calling the pandas.DataFrame class constructor on the dictionary. You don’t see a cal to the from_dict method because it’s automatically called when we reference the pandas.DataFrame constructor. If we want to access additional features for importing dictionaries, we’ll need to use the from_dict method directly. Let’s take a look at how we would do that.


from_dict Method

The pandas.DataFrame method from_dict has the format:

pandas.DataFrame.from_dict([data], orient='columns', dtype=None, columns=None)`

where

  1. [data] is the dictionary object we wish to import (or convert)
  2. orient is how each dictionary entry forms the frame. Can be 'index' or 'column', with 'column' as default
    • If each dictionary entry is a row to the DataFrame then use 'index'
    • If each dictionary entry is a column to the DataFrame then use 'column'
  3. dtype is an option to force the given data into a specified type. For example dtype=str will convert all entries into strings, and dtype=float will convert entries into floating point numbers. The default will be to infer the data type based on the input.
  4. columns will take a list of column labels if orient='index' 2

We can replicate the above example by directly using the from_dict method like this:

import pandas as pd # Don't forget to import!
dictOne = {"Column A":[1, 2, 3],
		   "Column B":[4, 5, 6],
		   "Column C":[7, 8, 9]}
df = pd.DataFrame.from_dict(dictOne)	
print(df)
>    Column A  Column B  Column C
> 0         1         4         7
> 1         2         5         8
> 2         3         6         9

Now suppose we have a dictionary containing the rows of a DataFrame. We can import this data using the orient='index' option:

import pandas as pd # Don't forget to import!
dictTwo = {"Row 1":[1, 2, 3],
		   "Row 2":[4, 5, 6],
		   "Row 3":[7, 8, 9]}
df = pd.DataFrame.from_dict(dictTwo, orient='index')	
print(df)
>        0  1  2
> Row 1  1  2  3
> Row 2  4  5  6
> Row 3  7  8  9

Now we see that the dictionary labels become the index names, and the list entries become the column values. Notice that because a list of column labels was not specified, Pandas automatically picked a vector of numbers as the column names. We can add a custom set of names ourselves using the columns option.

import pandas as pd # Don't forget to import!
colNames = ["Column A", "Column B", "Column C"]
dictTwo = {"Row 1":[1, 2, 3],
		   "Row 2":[4, 5, 6],
		   "Row 3":[7, 8, 9]}
df = pd.DataFrame.from_dict(dictTwo, orient='index', columns=colNames)	
>        Column A  Column B  Column C
> Row 1         1         2         3
> Row 2         4         5         6
> Row 3         7         8         9

It’s worth noting that some versions of Pandas do not include the columns option. In these cases we can assign the list of names directly to the columns attribute of the DataFrame object to get the same result. After you convert your dictionary to a Pandas DataFrame, you would add your own column names using a code like this:

colNames = ["Column A", "Column B", "Column C"]
df.columns = colNames
print(df)
>        Column A  Column B  Column C
> Row 1         1         2         3
> Row 2         4         5         6
> Row 3         7         8         9

Get Our Python Developer Kit for Free

I put together a Python Developer Kit with over 100 pre-built Python scripts covering data structures, Pandas, NumPy, Seaborn, machine learning, file processing, web scraping and a whole lot more - and I want you to have it for free. Enter your email address below and I'll send a copy your way.

Yes, I'll take a free Python Developer Kit

Convert List to Pandas DataFrame

There are a wide variety of methods to convert lists into Pandas DataFrames. In this tutorial we will cover a couple of popular methods, each of which may be more applicable to your application than the other. Most methods involve first converting the list data into a dictionary, however a direct conversion from list is possible and can be a more straightforward solution.


Method 1: List Directly to DataFrame

The simplest method is to store the data as a lists of lists, which the pandas.DataFrame object constructor can directly convert into a DataFrame.

import pandas as pd # Don't forget to import!
listOne = [[1, 2, 3],
		   [4, 5, 6],
		   [7, 8, 9]]
df = pd.DataFrame.from_dict(listOne)
print(df)
>    0  1  2
> 0  1  2  3
> 1  4  5  6
> 2  7  8  9

The primary issue with this method is that column labels and row labels cannot be included. If we want to include them, we can assign them to the columns and index attributes separately as lists:

listOne = [[1, 2, 3],
		   [4, 5, 6],
		   [7, 8, 9]]
colNames = ["Column A", "Column B", "Column C"]
rowNames = ["Row 1", "Row 2", "Row 3"]
df = pd.DataFrame.from_dict(listOne)
df.columns = colNames
df.index = rowNames
print(df)
>        Column A  Column B  Column C
> Row 1         1         2         3
> Row 2         4         5         6
> Row 3         7         8         9

You can see how adding the column and row labels help us organize our DataFrames for our data science projects.

One drawback of this method is the case where the column and row indices are already included in the list of lists. In this case, the pandas.DataFrame object constructor will naively store them as entries within the DataFrame

listTwo = [["Rows", "Column A", "Column B", "Column C"],
		   ["Row 1", 1, 2, 3],
		   ["Row 2", 4, 5, 6],
		   ["Row 3", 7, 8, 9]]
df = pd.DataFrame.from_dict(listTwo)
print(df)
>        0         1         2         3
> 0   Rows  Column A  Column B  Column C
> 1  Row 1         1         2         3
> 2  Row 2         4         5         6
> 3  Row 3         7         8         9

As you can see, the row indices and column names were included in the data, which was also forced to be a collection of strings rather than integers. This isn’t what you want. A simple way around this is to clean up the list and store column names and indices into separate lists prior to DataFrame conversion, and reinsert them afterwards. This can be done using a script like the following:

listTwo = [["Rows", "Column A", "Column B", "Column C"],
		   ["Row 1", 1, 2, 3],
		   ["Row 2", 4, 5, 6],
		   ["Row 3", 7, 8, 9]]
colNames = listTwo[0][1:]  # Exclude the first column, as the indices don't need a names
del(listTwo[0])  # Remove the column names from the data
rowNames = []  # Initialize the row names list for looping
for row in listTwo:
	rowNames.append(row[0])  # Collect the row indices
	del(row[0])  # Delete row indices from data list
df = pd.DataFrame.from_dict(listTwo)
df.columns = colNames  # Add back in the column names
df.index = rowNames  # Add back in the index names
print(df)
>        Column A  Column B  Column C
> Row 1         1         2         3
> Row 2         4         5         6
> Row 3         7         8         9

You can take this Python script and adapt it to your own project. It’s powerful way of cleaning up your lists so you can use them in Pandas DataFrames.


Method 2: List to Dictionary to DataFrame Conversion

A less direct, but popular, method of converting lists to DataFrames is to first convert your lists into dictionaries. After that, you’d follow the instructions in our convert dictionary to Pandas DataFrame section. This method involves more resources and is less “Pythonic” than the direct list conversion, but you may prefer this way based on the structure of your data or processes. The choice is yours.

We can start by creating a list of column names and row names. The code we presented to convert a list of lists with column names and row indices into separate lists of row indices, column names, and data can be used. We’ll assume that these three lists have already been created in the following examples.

One way to join lists into dictionaries is to use the Python zip function to convert lists of lists of data and column names into tuples, which are in turn converted to dictionaries which are converted into a DataFrame 3.

listOne = [[1, 2, 3],
		   [4, 5, 6],
		   [7, 8, 9]]
colNames = ["Column A", "Column B", "Column C"]
rowNames = ["Row 1", "Row 2", "Row 3"]
dictOne = dict(zip(colNames, listOne))
print(dictOne)
> {'Column A': [1, 2, 3], 'Column B': [4, 5, 6], 'Column C': [7, 8, 9]}

Notice that the above method assumes that the sublists are columns instead of rows. If each list is a list of rows, then you’ll need to use the row indices instead of the column names, like this:

listOne = [[1, 2, 3],
		   [4, 5, 6],
		   [7, 8, 9]]
colNames = ["Column A", "Column B", "Column C"]
rowNames = ["Row 1", "Row 2", "Row 3"]
dictTwo = dict(zip(rowNames, listOne))
print(dictTwo)
> {'Row 1': [1, 2, 3], 'Row 2': [4, 5, 6], 'Row 3': [7, 8, 9]}

Now we can implement the DataFrame using the conversion from dictionary to DataFrame

# Assuming we have a dictionary of rows in dictTwo
df = pd.DataFrame.from_dict(dictTwo, orient='index')
df.columns = colNames	
print(df)
>        Column A  Column B  Column C
> Row 1         1         2         3
> Row 2         4         5         6
> Row 3         7         8         9

Create Pandas DataFrame from List of Dicts

Importing a List of Dictionaries to Pandas DataFrames

There are some instances where a list contains elements which are dictionaries to be added to the DataFrame. It’s important to be able to convert these lists of dictionaries into Pandas DataFrames. For example, suppose we had the following list of dictionaries:

import pandas as pd
ListDictOne = [{"Column A":[1, 2, 3]},
		       {"Column B":[4, 5, 6]},
		       {"Column C":[7, 8, 9]}]

We can use a set of nested lists to consolidate these dictionaries into a single dictionary, which we can then convert into a DataFrame:

NewDict = {}  # Initialize a new dictionary
for listItem in ListDictOne:
	for key, value in listItem.items():  # Loop through all dictionary elements in the list
		if key in list(NewDict):  # if the key already exists, append to new
			for entry in value:
				NewDict[key].append(entry)
		else:  # if it's a new key, simply add to the new dictionary
			NewDict[key] = value
df = pd.DataFrame.from_dict(NewDict)  # Finally, create the DataFrame from the dictionary
print(df)
>    Column A  Column B  Column C
> 0         1         4         7
> 1         2         5         8
> 2         3         6         9

If this tutorial has made one thing clear, it’s that the most optimal way of reading data into a Pandas DataFrame is heavily dependent on the layout of your data. Always know the format and structure of your data before implementing a conversion process.

Did you find this free tutorial helpful? Share this article with your friends, classmates, and coworkers on Facebook and Twitter! When you spread the word on social media, you’re helping us grow so we can continue to provide free tutorials like this one for years to come.


Get Our Python Developer Kit for Free

I put together a Python Developer Kit with over 100 pre-built Python scripts covering data structures, Pandas, NumPy, Seaborn, machine learning, file processing, web scraping and a whole lot more - and I want you to have it for free. Enter your email address below and I'll send a copy your way.

Yes, I'll take a free Python Developer Kit

  1. Numpy must be installed, as it contains libraries that the Pandas module uses for various data array operations. Note that both packages come pre-installed in most IDEs, and can be downloaded together with the SciPy package. It is highly recommended to download these data science packages together to ensure that all necessary libraries are included. 

  2. Some older versions of Pandas do not include this option. For those versions, a separate call to the DataFrame object’s column attribute can be used to assign column names. 

  3. Your intuition may lead you to believe that this is an unnecessarily convoluted route. You would be correct.