Introduction to Python String Operations and Formatting
In our previous tutorials, we used strings to supply input and output information to both the terminal and external files. This tutorial will take an in-depth look at Python strings, particularly Python string operations, and how we can format strings in output. If you follow the Python Tutorials Blog, you’re probably used to seeing the simple print
function:
s = 1/7
print(s)
> 0.14285714285714285
In most settings, 17 decimals of precision would be misleading and/or unwise. Previously, we’ve used the round
function to mathematically round these numbers to a set number of decimals:
print(round(s, 4))
> 0.1429
But what if we want to truncate this number instead? What if we want to represent this number in scientific notation? To do this without dedicated formatting, we would need to create a complex numeric function. Luckily for us, Python has a collection of string operations and formatting methods to solve this problem.
Python String Operations
Before we look at outputting strings, let’s look at a selection of Python strings methods and functions that can be used to change the string objects themselves. These methods will all take in a string and return a new string with the given Python operation performed. This is different from other languages that perform a function on the given string object as a side affect without a return value. Therefore, we will always need to “catch” the output of these functions in a new (or the existing) variable if we wish to save the results.
split Method
The split
method is used to convert a Python string into a list of substrings, separated by a given delimiter.
split
has the format [string].split([delimiter])
, where
s = "The best of times."
print(s.split(" ")) # Split with a space delimiter
> ['The', 'best', 'of', 'times.']
The Python split operation is useful for splitting lines retrieved from a file with the getline
function.
join Method
The join
method performs the reverse of the split
statement; it takes a list of Python strings and concatenates them together with a given delimiter.
join
has the format [delimiter].join([list])
where split
method.
t = ['The', 'best', 'of', 'times.']
print(" ".join(t)) # Joining with a space delimiter
> The best of times.
Where the Python split method separates strings, the Python join
method is the string operation to bring them back together.
count Method
The count
method counts the number of non-overlapping instances of a substring within a given string. The count
string operation is case sensitive and has the format [string].count([substring])
, where
s = "It was the best of times."
print(s.count("t")) # Count the number of t's in a string
> 4
strip Method
The strip
method removes all instances of a given substring from the beginning and end of a given string. strip
has the format [string].strip([substring])
where
s = "****10.00***"
print(s.strip("*"))
> 10.00
s = "$****10.00" # Note that strip does not remove internal substrings
print(s.strip("*"))
> $****10.00 # The dollar sign prevented removal of the internal asterisks
strip
will strip the substring from both the left and right sides of the string, but will not touch strings in the middle. It’s a string cleanup operation. lstrip
is used to only strip characters from the left side of the string, and rstring
will do the same for the right side of your Python strings.
s = "****10.00***"
print(s.lstrip("*"))
> 10.00***
print(s.rstrip("*"))
> ****10.00
replace Method
The replace
method will replace a given substring with another substring inside a given string. replace
has the format [string].replace([old],[new],[count])
where
s = "$****10.00"
print(s.replace("*", "")) # Substitute asterisks with nothing
> $10.00
s = "The best of times."
print(s.replace("best", "worst"))
> The worst of times.
Where the strip
operation is used to clean up characters from the left and right side of your string, the replace
method can be used to remove (or replace) characters throughout your string.
Get Our Python Developer Kit for Free
I put together a Python Developer Kit with over 100 pre-built Python scripts covering data structures, Pandas, NumPy, Seaborn, machine learning, file processing, web scraping and a whole lot more - and I want you to have it for free. Enter your email address below and I'll send a copy your way.
upper and lower Methods
The upper
and lower
methods are used to change the capitalization of letters within a given string. These methods have the format [string].upper()
and [string].lower()
, where
s = "It was the BEST of times!"
print(s.upper())
> IT WAS THE BEST OF TIMES!
print(s.lower())
> it was the best of times!
These methods can be handy when requesting strings from a user that may have unknown capitalization.
s = input("Continue? Input y/n: ")
> Continue? Input y/n:
>> N
if (s.lower() == "n"): # Now the user can enter n or N.
exit()
find Method
The find
method will search a string from left to right, and return the first (lowest) index of a given substring. It’s important to remember that index counting starts at 0 on the far left of your string. The find
Python string operation both determines if the string contains the substring, and returns its location. find
has the format [string].find([substring])
, where
s = "It was the best of the times."
print(s.find("the"))
> 7
Similarly, rfind
will behave the same way, but will find the last (highest) index of the given substring:
s = "It was the best of the times."
print(s.rfind("the"))
> 19
If you only need to validate that the string contains a given substring rather than return its location, the in
statement can used to output a boolean value
s = "It was the best of the times."
print("the" in s)
> True
This is very handy when searching line by line through a file for a particular string.
center Method
The center
method will center a string by applying a padding string on both sides to achieve a given total length. center
has the format [string].find([totallength],[pad])
, where
This Python string operation is useful when you want to print headers with a standard format. Note that
s = "Introduction"
t = "Table of Contents"
print(s.center(30, "-"))
> ---------Introduction---------
print(t.center(30, "-"))
> ------Table of Contents-------
Python String Formatting
Now that we’ve looked at manipulating string objects, let’s look at how to change the output of variables in print statements and file output.
There are two ways to format Python output:
- the old
%
operation, and - the newer
format
method operation.
The %
operation for formatting output was used in older releases of Python, and most legacy code and online tutorials will use this method. The format
method is the preferred option for modern implementations of Python, and indeed some IDEs will flag the %
method as an error, since it assumes that %
is only used as the modulo binary operator. The %
method can still be used in Python Version 3, but the format
method is preferred unless you are working with older code.
Get Our Python Developer Kit for Free
I put together a Python Developer Kit with over 100 pre-built Python scripts covering data structures, Pandas, NumPy, Seaborn, machine learning, file processing, web scraping and a whole lot more - and I want you to have it for free. Enter your email address below and I'll send a copy your way.
% Method
The %
method works by inserting conversion specifiers into a string, each of which will be substituted by a proceeding variable or object. Let’s look at the details of these specifiers, then we’ll go over some examples.
Specifiers have the following format of required and optional parameters which must be inserted in the this order:
- The character
%
- (Optional) Mapping key contained within parentheses: (key)
- (Optional) Conversion flags (see below)
- (Optional) Minimum field width, or how much character space must be saved for the input variable
- (Optional) Precision, or how many decimals are included in floating point numbers
- Conversion Type (presented in table below)
After a string containing these specifiers is ended, a %
followed by either a tuple or dictionary is used to indicate the objects to be inserted.
The following is a table of conversion types that can be used:
Python Conversion Types | ||
Symbol | Output Format | Note |
---|---|---|
"s" | String | Will convert non-string types into strings (e.g. floats and integers) |
"d" | Signed (+/-) Integer Decimal | "i" can also be used |
"f" | Floating Point Decimal | "F" can also be used |
"e" | Floating Point in Scientific Notation (e.g. 1.2e10) | The symbol "E" outputs a capital "E" in output (e.g. 1.2E10) |
"g" | Floating Point with Variable Scientific Notation | Will output a regular floating point, with scientific notation if exponent is less than -4. "G" will output a capital "E" in exponents |
"%" | % Symbol | Used as an escape character for literally inserting a % into a string. |
There are additional conversion types available for output in hexadecimal and octal, but those are less commonly used.
The following is a set of conversion flags for specifying additional operations within the specifier:
Python Conversion Flags | ||
Symbol | Operation | Note |
---|---|---|
"0" | Zero Padding for Numeric Values | If a numeric character is less that the specified output width, "0"s will be used as padding |
"-" | Left Align | Numbers are left-aligned. Overrides "0" padding above |
" " (Space) | Leading Decimal Space | Leave a blank before positive numbers to align with negative numbers starting with a "-" |
"+" | Force Proceeding Sign Character (+/-) | Will require a sign for numeric characters, regardless of negative or positive |
Okay, I know we just threw a lot of information out there. Let’s take a look at an example to help clear things up. Suppose we wanted to insert a number into a string:
n = 42
s = "The secret number is %d" % n
print(s)
> The secret number is 42
Let break down the code from above. Inside of a string, we inserted the specifier %d
. The specifier started with the required %
, had no optional modifiers, and ended with the required conversion type d
for a decimal output format. The string was followed by another %
, indicating that we wish to format the string with variables, followed by a length 1 tuple of the variable we wanted to insert.
We can also use more advanced conversion types on the above example:
n = 42
m = -22
s = "The secret number is %(is)d, not %(isnot)d" % {"is": n, "isnot": m} # Use a dictionary as input
print(s)
> The secret number is 42, not -22
s = "The secret number is %d, not %d" % (n, m) # Use a tuple as input
print(s)
> The secret number is 42, not -22
s = "The secret number is %+d, not %+d" % (n, m) # Force use of signs
print(s)
> The secret number is +42, not -22
n = 42.2222222222222
m = -22.0
s = "The secret number is %07.2f, not %07.2f" % (n, m) # Pad with 0's, force length of 7, with 2 decimals of precision for floating point
print(s)
> The secret number is 0042.22, not -022.00
s = "The secret number is %.2e, not %.2E" % (n, m) # Use 2 decimals of precision, in two types of scientific notation
print(s)
> The secret number is 4.22e+01, not -2.20E+01
Formatting Python strings can be complicated. The best way to get comfortable with all the syntax is to experiment. Create short Python program like the ones above and tinker with the %
strings using the tables above.
format Method
The format
method uses a method associated with strings rather than an exterior statement like the older %
method, if that makes any sense.
The format
method follows closer to the Python standard of object use, and is preferred for use in new programs. The format
method uses specifiers within a Python string, followed by a method call that references what values should be replaced by those specifiers.
At first glance the format
method can look more confusing than the %
method, but in practice it really is more straightforward - it just has more options. The nested lists of options will be detailed below. Don’t worry. I’ll follow up with some examples.
Like the %
format method, there is a required order of symbols used in the specifier:
- Open curly brace “{“
- (Optional) Reference keyword (or integer for tuple of inputs)
- (Optional) “!” + Conversion Type. Converts the given variable into another type before formatting. Can only convert to strings with “!s” or “!r”.
- (Optional) “:” + Format Specifier. See below.
- Close curly brace “}”
The format specifier (item 4 in the list above) follows an additional set of ordering rules, which closely (but not exactly) mimic the %
format specifiers. These specifiers must follow the given order:
- Fill Character. Default is “ “ (space)
- Alignment character. See table below.
- Sign Option. See table below.
- Minimum character width
- ”,” to specify commas as thousands separators
- ”.” followed by number of decimals of precision
- Output Type. See table below.
The following are the available alignment characters that specify how the string is aligned within the given width:
Python Format Alignment Characters | ||
Symbol | Output Format | Note |
---|---|---|
"<" | Left Aligned | Forces left alignment. Default for all classes except numerics. |
">" | Right Aligned | Forces right alignment. Default for numeric types. |
"=" | Padding After Sign | Forces padding after the sign, before a number |
"^" | Center Align | Forces the field to be centered |
The following is a table of output types that can be used with the format
method. Note: While there are many similarities between this table and the one for the %
method, not all entries are identical. The default value for each of these depends on the Python class type of the referenced variable (i.e. int, float, string, etc.). If a type is omitted from the specifier, then the default for the referenced object’s class will be used.
Python Format Output Types | ||
Symbol | Output Format | Note |
---|---|---|
"s" | String | Default for strings |
"d" | Signed (+/-) Integer Decimal | Default for integer types |
"f" | Floating Point Decimal | "F" can also be used |
"e" | Floating Point in Scientific Notation (e.g. 1.2e10) | The symbol "E" outputs a capital "E" in output (e.g. 1.2E10) |
"g" | General Format | Will output a regular floating point, with scientific notation if exponent is less than a given precision (default is 6). "G" will output a capital "E" in exponents. Default for floats. |
"%" | Percents | Multiplies number by 100, and output as "f" format |
In most cases, the format
method follows the same convention as the %
method, but replaces %
specifiers with :
and contains all references within curly braces {}
.
To practice, let’s use the same examples we used with the %
method, but switched to the equivalent format
method.
n = 42
m = -22
s = "The secret number is {0}, not {1}".format(n, m) # Simple positional reference
print(s)
> The secret number is 42, not -22
s = "The secret number is {}, not {}".format(n, m) # Numbers are not needed for ordered input
print(s)
> The secret number is 42, not -22
s = "The secret number is {isthis}, not {isnot}".format(isthis=n, isnot=m) # Reference by name (no dictionary needed)
print(s)
> The secret number is 42, not -22
s = "The secret number is {:+}, not {:+}".format(n, m) # Force use of signs
print(s)
> The secret number is +42, not -22
n = 42.2222222222222
m = -22.0
s = "The secret number is {:07.2f}, not {:07.2f}".format(n, m) # Pad with 0's, force length of 7, with 2 decimals of precision for floating point
print(s)
> The secret number is 0042.22, not -022.00
s = "The secret number is {:.2e}, not {:.2E}".format(n, m) # Use 2 decimals of precision, in two types of scientific notation
print(s)
> The secret number is 4.22e+01, not -2.20E+01
Looking at these examples, you can see why Python decided to switch from the %
style to the format
style. The syntax for the format
statement is practically identical to the syntax for all the Python string operations we discussed at the top of this tutorial. They all follow a format like [string].operation([arguments])
. See? Once you get familiar with the syntax, you’ll start recognizing patterns and realize Python isn’t so hard after all!
Get Our Python Developer Kit for Free
I put together a Python Developer Kit with over 100 pre-built Python scripts covering data structures, Pandas, NumPy, Seaborn, machine learning, file processing, web scraping and a whole lot more - and I want you to have it for free. Enter your email address below and I'll send a copy your way.
Did you find this free tutorial helpful? Share this article with your friends, classmates, and coworkers on Facebook and Twitter! When you spread the word on social media, you’re helping us grow so we can continue to provide free tutorials like this one for years to come.