Introduction to Python String Operations and Formatting

In our previous tutorials, we used strings to supply input and output information to both the terminal and external files. This tutorial will take an in-depth look at Python strings, particularly Python string operations, and how we can format strings in output. If you follow the Python Tutorials Blog, you’re probably used to seeing the simple print function:

s = 1/7
print(s)
> 0.14285714285714285

In most settings, 17 decimals of precision would be misleading and/or unwise. Previously, we’ve used the round function to mathematically round these numbers to a set number of decimals:

print(round(s, 4))
> 0.1429

But what if we want to truncate this number instead? What if we want to represent this number in scientific notation? To do this without dedicated formatting, we would need to create a complex numeric function. Luckily for us, Python has a collection of string operations and formatting methods to solve this problem.


Python String Operations

Before we look at outputting strings, let’s look at a selection of Python strings methods and functions that can be used to change the string objects themselves. These methods will all take in a string and return a new string with the given Python operation performed. This is different from other languages that perform a function on the given string object as a side affect without a return value. Therefore, we will always need to “catch” the output of these functions in a new (or the existing) variable if we wish to save the results.


split Method

The split method is used to convert a Python string into a list of substrings, separated by a given delimiter. split has the format [string].split([delimiter]), where [delimiter] is the delimiting character (or substring of characters), and [string] is the string to be split. For example:

s = "The best of times."
print(s.split(" ")) # Split with a space delimiter
> ['The', 'best', 'of', 'times.']

The Python split operation is useful for splitting lines retrieved from a file with the getline function.


join Method

The join method performs the reverse of the split statement; it takes a list of Python strings and concatenates them together with a given delimiter. join has the format [delimiter].join([list]) where [delimiter] is the delimiting character (or substring of characters), and [list] is the list of strings to be joined. Notice that the format has the opposite order of inputs as the split method.

t = ['The', 'best', 'of', 'times.']
print(" ".join(t))  # Joining with a space delimiter
> The best of times.

Where the Python split method separates strings, the Python join method is the string operation to bring them back together.


count Method

The count method counts the number of non-overlapping instances of a substring within a given string. The count string operation is case sensitive and has the format [string].count([substring]), where [substring] is the substring to be counted, and [string] is the string from which substrings will be counted. This example will help you make more sense of that.

s = "It was the best of times."
print(s.count("t"))  # Count the number of t's in a string
> 4

strip Method

The strip method removes all instances of a given substring from the beginning and end of a given string. strip has the format [string].strip([substring]) where [substring] is the substring to be removed, and [string] is the string from which the substrings will be removed. For example, this command is useful for removing an unknown number of asterisks from a given string of currency:

s = "****10.00***"
print(s.strip("*"))
> 10.00
s = "$****10.00"  # Note that strip does not remove internal substrings
print(s.strip("*"))
> $****10.00  # The dollar sign prevented removal of the internal asterisks

strip will strip the substring from both the left and right sides of the string, but will not touch strings in the middle. It’s a string cleanup operation. lstrip is used to only strip characters from the left side of the string, and rstring will do the same for the right side of your Python strings.

s = "****10.00***"
print(s.lstrip("*"))
> 10.00***
print(s.rstrip("*"))
> ****10.00

replace Method

The replace method will replace a given substring with another substring inside a given string. replace has the format [string].replace([old],[new],[count]) where [string] is the string from which the substrings will be replaced, [old] is the substring to be replaced, [new] is the new substring to be inserted, and [count] is an optional specifier for the number of substrings you want to replace. The default for [count] is to replace all substrings.

s = "$****10.00" 
print(s.replace("*", ""))  # Substitute asterisks with nothing
> $10.00
s = "The best of times."
print(s.replace("best", "worst"))
> The worst of times.

Where the strip operation is used to clean up characters from the left and right side of your string, the replace method can be used to remove (or replace) characters throughout your string.


Get Our Python Developer Kit for Free

I put together a Python Developer Kit with over 100 pre-built Python scripts covering data structures, Pandas, NumPy, Seaborn, machine learning, file processing, web scraping and a whole lot more - and I want you to have it for free. Enter your email address below and I'll send a copy your way.

Yes, I'll take a free Python Developer Kit

upper and lower Methods

The upper and lower methods are used to change the capitalization of letters within a given string. These methods have the format [string].upper() and [string].lower(), where [string], of course, is the string to be modified.

s = "It was the BEST of times!" 
print(s.upper()) 
> IT WAS THE BEST OF TIMES!
print(s.lower()) 
> it was the best of times!

These methods can be handy when requesting strings from a user that may have unknown capitalization.

s = input("Continue? Input y/n: ") 
> Continue? Input y/n: 
>> N
if (s.lower() == "n"):  # Now the user can enter n or N.
	exit()

find Method

The find method will search a string from left to right, and return the first (lowest) index of a given substring. It’s important to remember that index counting starts at 0 on the far left of your string. The find Python string operation both determines if the string contains the substring, and returns its location. find has the format [string].find([substring]), where [string] is the string to be searched and [substring] is the substring to be found. The method will return -1 if the substring cannot be found. This is similar to the VBA InStr function for my VBA readers out there.

s = "It was the best of the times." 
print(s.find("the")) 
> 7

Similarly, rfind will behave the same way, but will find the last (highest) index of the given substring:

s = "It was the best of the times." 
print(s.rfind("the")) 
> 19

If you only need to validate that the string contains a given substring rather than return its location, the in statement can used to output a boolean value

s = "It was the best of the times." 
print("the" in s) 
> True

This is very handy when searching line by line through a file for a particular string.


center Method

The center method will center a string by applying a padding string on both sides to achieve a given total length. center has the format [string].find([totallength],[pad]), where [string] is the given string, [pad] is the substring you want to use for padding, and [totallength] is your desired total width of the final string. The default character for [pad] is a space.

This Python string operation is useful when you want to print headers with a standard format. Note that [totallength] is the length of the output string, including the original given string.

s = "Introduction" 
t = "Table of Contents" 
print(s.center(30, "-")) 
> ---------Introduction---------
print(t.center(30, "-"))
> ------Table of Contents-------

Python String Formatting

Now that we’ve looked at manipulating string objects, let’s look at how to change the output of variables in print statements and file output.

There are two ways to format Python output:

  1. the old % operation, and
  2. the newer format method operation.

The % operation for formatting output was used in older releases of Python, and most legacy code and online tutorials will use this method. The format method is the preferred option for modern implementations of Python, and indeed some IDEs will flag the % method as an error, since it assumes that % is only used as the modulo binary operator. The % method can still be used in Python Version 3, but the format method is preferred unless you are working with older code.


Get Our Python Developer Kit for Free

I put together a Python Developer Kit with over 100 pre-built Python scripts covering data structures, Pandas, NumPy, Seaborn, machine learning, file processing, web scraping and a whole lot more - and I want you to have it for free. Enter your email address below and I'll send a copy your way.

Yes, I'll take a free Python Developer Kit

% Method

The % method works by inserting conversion specifiers into a string, each of which will be substituted by a proceeding variable or object. Let’s look at the details of these specifiers, then we’ll go over some examples.

Specifiers have the following format of required and optional parameters which must be inserted in the this order:

  1. The character %
  2. (Optional) Mapping key contained within parentheses: (key)
  3. (Optional) Conversion flags (see below)
  4. (Optional) Minimum field width, or how much character space must be saved for the input variable
  5. (Optional) Precision, or how many decimals are included in floating point numbers
  6. Conversion Type (presented in table below)

After a string containing these specifiers is ended, a % followed by either a tuple or dictionary is used to indicate the objects to be inserted.

The following is a table of conversion types that can be used:

Python Conversion Types
Symbol Output Format Note
"s" String Will convert non-string types into strings (e.g. floats and integers)
"d" Signed (+/-) Integer Decimal "i" can also be used
"f" Floating Point Decimal "F" can also be used
"e" Floating Point in Scientific Notation (e.g. 1.2e10) The symbol "E" outputs a capital "E" in output (e.g. 1.2E10)
"g" Floating Point with Variable Scientific Notation Will output a regular floating point, with scientific notation if exponent is less than -4. "G" will output a capital "E" in exponents
"%" % Symbol Used as an escape character for literally inserting a % into a string.

There are additional conversion types available for output in hexadecimal and octal, but those are less commonly used.

The following is a set of conversion flags for specifying additional operations within the specifier:

Python Conversion Flags
Symbol Operation Note
"0" Zero Padding for Numeric Values If a numeric character is less that the specified output width, "0"s will be used as padding
"-" Left Align Numbers are left-aligned. Overrides "0" padding above
" " (Space) Leading Decimal Space Leave a blank before positive numbers to align with negative numbers starting with a "-"
"+" Force Proceeding Sign Character (+/-) Will require a sign for numeric characters, regardless of negative or positive

Okay, I know we just threw a lot of information out there. Let’s take a look at an example to help clear things up. Suppose we wanted to insert a number into a string:

n = 42
s = "The secret number is %d" % n
print(s)
> The secret number is 42

Let break down the code from above. Inside of a string, we inserted the specifier %d. The specifier started with the required %, had no optional modifiers, and ended with the required conversion type d for a decimal output format. The string was followed by another %, indicating that we wish to format the string with variables, followed by a length 1 tuple of the variable we wanted to insert.

We can also use more advanced conversion types on the above example:

n = 42
m = -22
s = "The secret number is %(is)d, not %(isnot)d" % {"is": n, "isnot": m}  # Use a dictionary as input
print(s)
> The secret number is 42, not -22
s = "The secret number is %d, not %d" % (n, m)  # Use a tuple as input
print(s)
> The secret number is 42, not -22
s = "The secret number is %+d, not %+d" % (n, m)  # Force use of signs
print(s)
> The secret number is +42, not -22

n = 42.2222222222222
m = -22.0
s = "The secret number is %07.2f, not %07.2f" % (n, m)  # Pad with 0's, force length of 7, with 2 decimals of precision for floating point
print(s)
> The secret number is 0042.22, not -022.00
s = "The secret number is %.2e, not %.2E" % (n, m)  # Use 2 decimals of precision, in two types of scientific notation
print(s)
> The secret number is 4.22e+01, not -2.20E+01

Formatting Python strings can be complicated. The best way to get comfortable with all the syntax is to experiment. Create short Python program like the ones above and tinker with the % strings using the tables above.


format Method

The format method uses a method associated with strings rather than an exterior statement like the older % method, if that makes any sense.

The format method follows closer to the Python standard of object use, and is preferred for use in new programs. The format method uses specifiers within a Python string, followed by a method call that references what values should be replaced by those specifiers.

At first glance the format method can look more confusing than the % method, but in practice it really is more straightforward - it just has more options. The nested lists of options will be detailed below. Don’t worry. I’ll follow up with some examples.

Like the % format method, there is a required order of symbols used in the specifier:

  1. Open curly brace “{“
  2. (Optional) Reference keyword (or integer for tuple of inputs)
  3. (Optional) “!” + Conversion Type. Converts the given variable into another type before formatting. Can only convert to strings with “!s” or “!r”.
  4. (Optional) “:” + Format Specifier. See below.
  5. Close curly brace “}”

The format specifier (item 4 in the list above) follows an additional set of ordering rules, which closely (but not exactly) mimic the % format specifiers. These specifiers must follow the given order:

  1. Fill Character. Default is “ “ (space)
  2. Alignment character. See table below.
  3. Sign Option. See table below.
  4. Minimum character width
  5. ”,” to specify commas as thousands separators
  6. ”.” followed by number of decimals of precision
  7. Output Type. See table below.

The following are the available alignment characters that specify how the string is aligned within the given width:

Python Format Alignment Characters
Symbol Output Format Note
"<" Left Aligned Forces left alignment. Default for all classes except numerics.
">" Right Aligned Forces right alignment. Default for numeric types.
"=" Padding After Sign Forces padding after the sign, before a number
"^" Center Align Forces the field to be centered

The following is a table of output types that can be used with the format method. Note: While there are many similarities between this table and the one for the % method, not all entries are identical. The default value for each of these depends on the Python class type of the referenced variable (i.e. int, float, string, etc.). If a type is omitted from the specifier, then the default for the referenced object’s class will be used.

Python Format Output Types
Symbol Output Format Note
"s" String Default for strings
"d" Signed (+/-) Integer Decimal Default for integer types
"f" Floating Point Decimal "F" can also be used
"e" Floating Point in Scientific Notation (e.g. 1.2e10) The symbol "E" outputs a capital "E" in output (e.g. 1.2E10)
"g" General Format Will output a regular floating point, with scientific notation if exponent is less than a given precision (default is 6). "G" will output a capital "E" in exponents. Default for floats.
"%" Percents Multiplies number by 100, and output as "f" format

In most cases, the format method follows the same convention as the % method, but replaces % specifiers with : and contains all references within curly braces {}.

To practice, let’s use the same examples we used with the % method, but switched to the equivalent format method.

n = 42
m = -22
s = "The secret number is {0}, not {1}".format(n, m) # Simple positional reference
print(s)
> The secret number is 42, not -22
s = "The secret number is {}, not {}".format(n, m) # Numbers are not needed for ordered input
print(s)
> The secret number is 42, not -22
s = "The secret number is {isthis}, not {isnot}".format(isthis=n, isnot=m)   # Reference by name (no dictionary needed)
print(s)
> The secret number is 42, not -22
s = "The secret number is {:+}, not {:+}".format(n, m) # Force use of signs
print(s)
> The secret number is +42, not -22

n = 42.2222222222222
m = -22.0
s = "The secret number is {:07.2f}, not {:07.2f}".format(n, m) # Pad with 0's, force length of 7, with 2 decimals of precision for floating point
print(s)
> The secret number is 0042.22, not -022.00
s = "The secret number is {:.2e}, not {:.2E}".format(n, m) # Use 2 decimals of precision, in two types of scientific notation
print(s)
> The secret number is 4.22e+01, not -2.20E+01

Looking at these examples, you can see why Python decided to switch from the % style to the format style. The syntax for the format statement is practically identical to the syntax for all the Python string operations we discussed at the top of this tutorial. They all follow a format like [string].operation([arguments]). See? Once you get familiar with the syntax, you’ll start recognizing patterns and realize Python isn’t so hard after all!


Get Our Python Developer Kit for Free

I put together a Python Developer Kit with over 100 pre-built Python scripts covering data structures, Pandas, NumPy, Seaborn, machine learning, file processing, web scraping and a whole lot more - and I want you to have it for free. Enter your email address below and I'll send a copy your way.

Yes, I'll take a free Python Developer Kit

Did you find this free tutorial helpful? Share this article with your friends, classmates, and coworkers on Facebook and Twitter! When you spread the word on social media, you’re helping us grow so we can continue to provide free tutorials like this one for years to come.