When importing or manipulating data within a Pandas DataFrame, you may notice that the data is represented as strings of numbers, rather than the numeric types themselves. Strings cannot be used in numeric calculations, and will not produce numerical summary data when using the Pandas describe function. To remedy this, we can use a variety of Pandas operations to convert string data to numeric types.

Suppose we have a set of grades data contained within the Pandas DataFrame grades. We can use the Pandas head and info methods to examine the data:

import pandas as pd # Don't forget to import pandas!
grades.head()  # Look at the first few rows (head) of the data
>   StudentID Homework Midterm Project Final
> 0      4560      100   97.68     100     A
> 1      5540    85.68   90.02   88.54     B
> 2      6889    92.06   85.74   88.84     B
> 3      6817    65.02    85.5   87.86     C

grades.info()  # Output information about the DataFrame itself
> <class 'pandas.core.frame.DataFrame'>
> RangeIndex: 4 entries, 0 to 3
> Data columns (total 5 columns):
> StudentID    4 non-null object
> Homework     4 non-null object
> Midterm      4 non-null object
> Project      4 non-null object
> Final        4 non-null object
> dtypes: object(5)
> memory usage: 240.0+ bytes

We can see from the info output that the data has been imported as a “non-null object,” which in this case are strings. If we want to summarize the data with the describe method, we will receive a description of the objects rather than numeric summaries:

grades.describe()
>        StudentID Homework Midterm Project Final
> count          4        4       4       4     4
> unique         4        4       4       4     3
> top         5540    85.68   97.68   88.84     B
> freq           1        1       1       1     2

This tutorial will cover how to convert these Pandas strings into numbers so you can evaluate them numerically.


Pandas Convert String Column to Numeric

The simplest method of converting Pandas DataFrame data into numeric types is the to_numeric function of Pandas. This function has the format [Numeric Column] = pandas.to_numeric([String Column]) where [String Column] is the column1 of strings we wish to convert, and [Numeric Column] is the new column of converted numbers. To convert a column within a DataFrame, you can simply assign the new numeric column back to the original column in the DataFrame. Since to_numeric will convert a single column of Pandas strings into numbers, you’ll need to iterate over them with a for loop to convert all of them. Take a look at this Python example to find out how:

import pandas as pd 
cols = ['StudentID', 'Homework', 'Midterm', 'Project']  # We don't want to convert the Final grade column.
for col in cols:  # Iterate over chosen columns
	grades[col] = pd.to_numeric(grades[col])

grades.info()
> <class 'pandas.core.frame.DataFrame'>
> RangeIndex: 4 entries, 0 to 3
> Data columns (total 5 columns):
> StudentID    4 non-null int64
> Homework     4 non-null float64
> Midterm      4 non-null float64
> Project      4 non-null float64
> Final        4 non-null object
> dtypes: float64(3), int64(1), object(1)
> memory usage: 240.0+ bytes

grades.describe()  # Now describe will report numeric summaries
>          StudentID    Homework    Midterm     Project
> count     4.000000    4.000000   4.000000    4.000000
> mean   5951.500000   85.690000  89.735000   91.310000
> std    1115.586692   14.973332   5.689156    5.807822
> min    4560.000000   65.020000  85.500000   87.860000
> 25%    5295.000000   80.515000  85.680000   88.370000
> 50%    6178.500000   88.870000  87.880000   88.690000
> 75%    6835.000000   94.045000  91.935000   91.630000
> max    6889.000000  100.000000  97.680000  100.000000

See how the describe function now reports a numeric summary of our DataFrame? That’s the advantage of converting your Pandas strings to numbers.

The problem with this approach is that the to_numeric column guesses which data type to convert your strings to. You can see in the info output, it converted the StudentID column to int64 and the Homework, Midterm, and Project columns to float64.

What if you wanted more control of how your numbers were converted? Pandas has a function for that, too!


You can learn Python in half the time
I see people struggling with Python every day and I want to help. That's why I developed this systematic approach to learning Python - FAST. This powerful training program exposes you to the Python programming language in a natural way so learning is easy.

I want to join the free wellsrPRO Python Training program

Pandas Convert General Data Types

For most applications, the pandas.to_numeric function above can be used. However, in situations where Pandas may convert types incorrectly (e.g. convert strings to floats instead of integers), then we can use the as_type DataFrame method to specify the exact data type to which we wish to convert the data. This method is more generic than pandas.to_numeric, as it can convert any data type. This method has the format [dtype2 Column] = [dtype1 Column].astype(dtype=[dtype2]) where [dtype1 Column] is the original column or DataFrame2 and [dtype2 Column] is the output column or DataFrame converted to Pandas data type [dtype2], where [dtype2] is a Numpy data type. That was a mouthful. Take a look at these examples to help it make more sense.


Pandas Convert String to Float

Strings can be converted to floats using the astype method with the Numpy data type numpy.float64:

import pandas as pd
import numpy as np  # To use the int64 dtype, we will need to import numpy
cols = ['Homework', 'Midterm', 'Project']
for col in cols:
	grades[col] = grades[col].astype(dtype=np.float64)
grades.info()
> <class 'pandas.core.frame.DataFrame'>
> RangeIndex: 4 entries, 0 to 3
> Data columns (total 5 columns):
> StudentID    4 non-null object
> Homework     4 non-null float64
> Midterm      4 non-null float64
> Project      4 non-null float64
> Final        4 non-null object
> dtypes: float64(3), object(2)
> memory usage: 240.0+ bytes

You can see how in our info output that the Homework, Midterm, and Project columns have all been converted from Pandas strings to floats using the astype function. Specifically, the three columns are now float64 data types.


Pandas Convert String to Int

Pandas strings can be converted to integers using the astype method with the Numpy data type numpy.int64:

import pandas as pd
import numpy as np  # To use the int64 dtype, we will need to import numpy
grades["StudentID"] = grades["StudentID"].astype(dtype=np.int64)
grades.info()
> <class 'pandas.core.frame.DataFrame'>
> RangeIndex: 4 entries, 0 to 3
> Data columns (total 5 columns):
> StudentID    4 non-null int64
> Homework     4 non-null object
> Midterm      4 non-null object
> Project      4 non-null object
> Final        4 non-null object
> dtypes: int64(1), object(4)
> memory usage: 240.0+ bytes

In this example, the StudentID column was converted from a string to an integer, as is evident by the output of the info function.

Did you find this free tutorial helpful? Share this article with your friends, classmates, and coworkers on Facebook and Twitter! When you spread the word on social media, you’re helping us grow so we can continue to provide free tutorials like this one for years to come.


You can learn Python in half the time
I see people struggling with Python every day and I want to help. That's why I developed this systematic approach to learning Python - FAST. This powerful training program exposes you to the Python programming language in a natural way so learning is easy.

I want to join the free wellsrPRO Python Training program

  1. Technically, pandas.to_numeric takes a pandas.Series object and returns the same. pandas.Series objects are data vectors with associated indices and metadata, which for all practical purposes are DataFrames of a single column. 

  2. Unlike pandas.to_numeric which takes only a pandas.Series object, as_type can convert an entire DataFrame. Because we usually only want to convert specific columns within a DataFrame, we can convert columns individually as we did with the pandas.to_numeric function. 

This article was written by Cody Gilbert, contributing writer for The Python Tutorials Blog.

About The Python Tutorials Blog

Ryan Wells

The Python Tutorials Blog was created by Ryan Wells, a Nuclear Engineer and professional VBA Developer. Ryan developed a unique 3-part free Excel training program to help others quickly learn VBA in a natural setting: right inside Excel. After his successful VBA Tutorials, which have helped hundreds of thousands learn to write better macros, he built The Python Tutorials Blog to teach people Python in a similar systematic way.