- Introduction
- Convert Dictionary to Pandas DataFrame
- Convert List to Pandas DataFrame
- Create Pandas DataFrame from List of Dicts
Introduction to Importing Python Data to Pandas DataFrames
This tutorial will cover the basics of importing data from the internal Python lists and dictionaries into Pandas DataFrame structures. We will provide additional tutorials for importing data from external sources such as Microsoft Excel in later tutorials. The Pandas DataFrame structure provides a suite of tools for the manipulation and inspection of data. These DataFrames provide a powerful backbone to any data science project, and knowing how to create them from existing data is crucial.
Remember that any use of the Pandas module requires both installation of Pandas and Numpy 1 into the execution environment. Once the package is installed, all modules using the Pandas module will require the import statement import pandas
at the start of the module.
This tutorial will show you how to convert dictionaries and lists to Pandas DataFrames. We’ll also explain how to create a Pandas DataFrame from a list of dicts and a list of lists. However your Python data is structured, this tutorial will show you how to convert them into a Pandas DataFrame.
Convert Dictionary to Pandas DataFrame
Pandas has a builtin method of converting a dictionary whose keys are column labels, and whose values are a list of table entries, into a DataFrame. In other words, Pandas will convert the following dictionary
dictOne = {"Column A":[1, 2, 3],
"Column B":[4, 5, 6],
"Column C":[7, 8, 9]}
into the following Pandas DataFrame:
import pandas as pd
pd.DataFrame(dictOne)
> Column A Column B Column C
> 0 1 4 7
> 1 2 5 8
> 2 3 6 9
As you can see from the above example, conversion from dictionary to DataFrame was as simple as calling the pandas.DataFrame
class constructor on the dictionary. You don’t see a cal to the from_dict
method because it’s automatically called when we reference the pandas.DataFrame
constructor. If we want to access additional features for importing dictionaries, we’ll need to use the from_dict
method directly. Let’s take a look at how we would do that.
from_dict Method
The pandas.DataFrame
method from_dict
has the format:
pandas.DataFrame.from_dict([data], orient='columns', dtype=None, columns=None)`
where
[data]
is the dictionary object we wish to import (or convert)orient
is how each dictionary entry forms the frame. Can be'index'
or'column'
, with'column'
as default- If each dictionary entry is a row to the DataFrame then use
'index'
- If each dictionary entry is a column to the DataFrame then use
'column'
- If each dictionary entry is a row to the DataFrame then use
dtype
is an option to force the given data into a specified type. For exampledtype=str
will convert all entries into strings, anddtype=float
will convert entries into floating point numbers. The default will be to infer the data type based on the input.columns
will take a list of column labels iforient='index'
2
We can replicate the above example by directly using the from_dict
method like this:
import pandas as pd # Don't forget to import!
dictOne = {"Column A":[1, 2, 3],
"Column B":[4, 5, 6],
"Column C":[7, 8, 9]}
df = pd.DataFrame.from_dict(dictOne)
print(df)
> Column A Column B Column C
> 0 1 4 7
> 1 2 5 8
> 2 3 6 9
Now suppose we have a dictionary containing the rows of a DataFrame. We can import this data using the orient='index'
option:
import pandas as pd # Don't forget to import!
dictTwo = {"Row 1":[1, 2, 3],
"Row 2":[4, 5, 6],
"Row 3":[7, 8, 9]}
df = pd.DataFrame.from_dict(dictTwo, orient='index')
print(df)
> 0 1 2
> Row 1 1 2 3
> Row 2 4 5 6
> Row 3 7 8 9
Now we see that the dictionary labels become the index names, and the list entries become the column values. Notice that because a list of column labels was not specified, Pandas automatically picked a vector of numbers as the column names. We can add a custom set of names ourselves using the columns
option.
import pandas as pd # Don't forget to import!
colNames = ["Column A", "Column B", "Column C"]
dictTwo = {"Row 1":[1, 2, 3],
"Row 2":[4, 5, 6],
"Row 3":[7, 8, 9]}
df = pd.DataFrame.from_dict(dictTwo, orient='index', columns=colNames)
> Column A Column B Column C
> Row 1 1 2 3
> Row 2 4 5 6
> Row 3 7 8 9
It’s worth noting that some versions of Pandas do not include the columns
option. In these cases we can assign the list of names directly to the columns
attribute of the DataFrame object to get the same result. After you convert your dictionary to a Pandas DataFrame, you would add your own column names using a code like this:
colNames = ["Column A", "Column B", "Column C"]
df.columns = colNames
print(df)
> Column A Column B Column C
> Row 1 1 2 3
> Row 2 4 5 6
> Row 3 7 8 9
Code More, Distract Less: Support Our Ad-Free Site
You might have noticed we removed ads from our site - we hope this enhances your learning experience. To help sustain this, please take a look at our Python Developer Kit and our comprehensive cheat sheets. Each purchase directly supports this site, ensuring we can continue to offer you quality, distraction-free tutorials.
Convert List to Pandas DataFrame
There are a wide variety of methods to convert lists into Pandas DataFrames. In this tutorial we will cover a couple of popular methods, each of which may be more applicable to your application than the other. Most methods involve first converting the list data into a dictionary, however a direct conversion from list is possible and can be a more straightforward solution.
Method 1: List Directly to DataFrame
The simplest method is to store the data as a lists of lists, which the pandas.DataFrame
object constructor can directly convert into a DataFrame.
import pandas as pd # Don't forget to import!
listOne = [[1, 2, 3],
[4, 5, 6],
[7, 8, 9]]
df = pd.DataFrame.from_dict(listOne)
print(df)
> 0 1 2
> 0 1 2 3
> 1 4 5 6
> 2 7 8 9
The primary issue with this method is that column labels and row labels cannot be included. If we want to include them, we can assign them to the columns
and index
attributes separately as lists:
listOne = [[1, 2, 3],
[4, 5, 6],
[7, 8, 9]]
colNames = ["Column A", "Column B", "Column C"]
rowNames = ["Row 1", "Row 2", "Row 3"]
df = pd.DataFrame.from_dict(listOne)
df.columns = colNames
df.index = rowNames
print(df)
> Column A Column B Column C
> Row 1 1 2 3
> Row 2 4 5 6
> Row 3 7 8 9
You can see how adding the column and row labels help us organize our DataFrames for our data science projects.
One drawback of this method is the case where the column and row indices are already included in the list of lists. In this case, the pandas.DataFrame
object constructor will naively store them as entries within the DataFrame
listTwo = [["Rows", "Column A", "Column B", "Column C"],
["Row 1", 1, 2, 3],
["Row 2", 4, 5, 6],
["Row 3", 7, 8, 9]]
df = pd.DataFrame.from_dict(listTwo)
print(df)
> 0 1 2 3
> 0 Rows Column A Column B Column C
> 1 Row 1 1 2 3
> 2 Row 2 4 5 6
> 3 Row 3 7 8 9
As you can see, the row indices and column names were included in the data, which was also forced to be a collection of strings rather than integers. This isn’t what you want. A simple way around this is to clean up the list and store column names and indices into separate lists prior to DataFrame conversion, and reinsert them afterwards. This can be done using a script like the following:
listTwo = [["Rows", "Column A", "Column B", "Column C"],
["Row 1", 1, 2, 3],
["Row 2", 4, 5, 6],
["Row 3", 7, 8, 9]]
colNames = listTwo[0][1:] # Exclude the first column, as the indices don't need a names
del(listTwo[0]) # Remove the column names from the data
rowNames = [] # Initialize the row names list for looping
for row in listTwo:
rowNames.append(row[0]) # Collect the row indices
del(row[0]) # Delete row indices from data list
df = pd.DataFrame.from_dict(listTwo)
df.columns = colNames # Add back in the column names
df.index = rowNames # Add back in the index names
print(df)
> Column A Column B Column C
> Row 1 1 2 3
> Row 2 4 5 6
> Row 3 7 8 9
You can take this Python script and adapt it to your own project. It’s powerful way of cleaning up your lists so you can use them in Pandas DataFrames.
Method 2: List to Dictionary to DataFrame Conversion
A less direct, but popular, method of converting lists to DataFrames is to first convert your lists into dictionaries. After that, you’d follow the instructions in our convert dictionary to Pandas DataFrame section. This method involves more resources and is less “Pythonic” than the direct list conversion, but you may prefer this way based on the structure of your data or processes. The choice is yours.
We can start by creating a list of column names and row names. The code we presented to convert a list of lists with column names and row indices into separate lists of row indices, column names, and data can be used. We’ll assume that these three lists have already been created in the following examples.
One way to join lists into dictionaries is to use the Python zip
function to convert lists of lists of data and column names into tuples, which are in turn converted to dictionaries which are converted into a DataFrame 3.
listOne = [[1, 2, 3],
[4, 5, 6],
[7, 8, 9]]
colNames = ["Column A", "Column B", "Column C"]
rowNames = ["Row 1", "Row 2", "Row 3"]
dictOne = dict(zip(colNames, listOne))
print(dictOne)
> {'Column A': [1, 2, 3], 'Column B': [4, 5, 6], 'Column C': [7, 8, 9]}
Notice that the above method assumes that the sublists are columns instead of rows. If each list is a list of rows, then you’ll need to use the row indices instead of the column names, like this:
listOne = [[1, 2, 3],
[4, 5, 6],
[7, 8, 9]]
colNames = ["Column A", "Column B", "Column C"]
rowNames = ["Row 1", "Row 2", "Row 3"]
dictTwo = dict(zip(rowNames, listOne))
print(dictTwo)
> {'Row 1': [1, 2, 3], 'Row 2': [4, 5, 6], 'Row 3': [7, 8, 9]}
Now we can implement the DataFrame using the conversion from dictionary to DataFrame
# Assuming we have a dictionary of rows in dictTwo
df = pd.DataFrame.from_dict(dictTwo, orient='index')
df.columns = colNames
print(df)
> Column A Column B Column C
> Row 1 1 2 3
> Row 2 4 5 6
> Row 3 7 8 9
Create Pandas DataFrame from List of Dicts
Importing a List of Dictionaries to Pandas DataFrames
There are some instances where a list contains elements which are dictionaries to be added to the DataFrame. It’s important to be able to convert these lists of dictionaries into Pandas DataFrames. For example, suppose we had the following list of dictionaries:
import pandas as pd
ListDictOne = [{"Column A":[1, 2, 3]},
{"Column B":[4, 5, 6]},
{"Column C":[7, 8, 9]}]
We can use a set of nested lists to consolidate these dictionaries into a single dictionary, which we can then convert into a DataFrame:
NewDict = {} # Initialize a new dictionary
for listItem in ListDictOne:
for key, value in listItem.items(): # Loop through all dictionary elements in the list
if key in list(NewDict): # if the key already exists, append to new
for entry in value:
NewDict[key].append(entry)
else: # if it's a new key, simply add to the new dictionary
NewDict[key] = value
df = pd.DataFrame.from_dict(NewDict) # Finally, create the DataFrame from the dictionary
print(df)
> Column A Column B Column C
> 0 1 4 7
> 1 2 5 8
> 2 3 6 9
If this tutorial has made one thing clear, it’s that the most optimal way of reading data into a Pandas DataFrame is heavily dependent on the layout of your data. Always know the format and structure of your data before implementing a conversion process.
Did you find this free tutorial helpful? Share this article with your friends, classmates, and coworkers on Facebook and Twitter! When you spread the word on social media, you’re helping us grow so we can continue to provide free tutorials like this one for years to come.
Code More, Distract Less: Support Our Ad-Free Site
You might have noticed we removed ads from our site - we hope this enhances your learning experience. To help sustain this, please take a look at our Python Developer Kit and our comprehensive cheat sheets. Each purchase directly supports this site, ensuring we can continue to offer you quality, distraction-free tutorials.
-
Numpy must be installed, as it contains libraries that the Pandas module uses for various data array operations. Note that both packages come pre-installed in most IDEs, and can be downloaded together with the SciPy package. It is highly recommended to download these data science packages together to ensure that all necessary libraries are included. ↩
-
Some older versions of Pandas do not include this option. For those versions, a separate call to the DataFrame object’s
column
attribute can be used to assign column names. ↩ -
Your intuition may lead you to believe that this is an unnecessarily convoluted route. You would be correct. ↩