Introduction to VBA RegEx

A regular expression, or RegEx, is a sequence of characters that defines a search pattern which is used for finding a specific pattern of characters in a string. This tutorial will show you a VBA RegEx pattern example and will demonstrate why using regular expressions in VBA is so powerful.

Regular expressions are a very large topic and many books have been written about it. It’s also a topic that many people find intimidating due to the rather cryptic and compact syntax and because the syntax varies considerably from language to language.

The VBA RegEx syntax differs quite a lot from other RegEx syntaxes, such as those used in Java, Perl or Python. This is primarily because the VBA RegEx object comes with optional parameters that you must sometimes specify to obtain the same results as you would by using standard RegEx syntax in other languages. I know. This explanation even sounds weird.

For these reasons, we’ll focus on VBA RegEx in this article. There are may be a few sections where we’ll assume you already have a basic understanding of regular expressions, but don’t worry if you’re new to RegEx! If you’re completely new to the topic and you find it slightly intimidating or difficult, you should still be able to grasp the basic concepts after reading this, so stick with us!

We’ll start by presenting the essential Regex syntax and then we’ll show you how to instantiate the RegEx object in VBA. Next, we’ll describe the parameters of the VBA RegEx object and present a general VBA RegEx function which returns substring matches from the input text passed to the function. We’ll also show you a fairly advanced example which incorporates many of the RegEx features we’ve previously presented and then we’ll finish off with a brief discussion on application ideas and when not to use VBA RegEx.


VBA RegEx syntax essentials

A RegEx search pattern is a sequence of characters consisting of one or more of the following components:

  • Literal characters.
  • Metacharacters.

Literal characters simply match their counterparts in the input text. For instance, the literal character sequence abc in a search pattern will simply match all occurrences of abc in the input text. Metacharacters, on the other hand, are far more abstract. We’ll cover the most important ones next.

Common metacharacters

A metacharacter is a character, or combination of characters, with a special meaning in a RegEx pattern. Unlike literal characters, metacharacters can match several different characters in the input text or they can even represent something other than a character. I know some of these descriptions may be a bit confusing, but I encourage you to take a moment to familiarize yourself with the following metacharacters:

Metacharacter Description
. "Wildcard." The unescaped period matches any character, except a new line.
^ "Beginning of a string or line" or "negation." The significance of the caret in search patterns is context-dependent. 1) If the VBA MultiLine parameter is set to True, the caret means "beginning of line," rather than "beginning of string." 2) If the caret is placed within a character class [^], it acts as a negation, i.e. "match all characters NOT matched by the other parameters of the character class."
$ "End of a string." The significance of the dollar sign in search patterns is context-dependent. If the VBA MultiLine parameter is set to True, the dollar sign means "end of line," rather than "end of string."
\ "Escape." The backslash in front of a metacharacter turns it into a literal character.
\b "Word boundary" or "backspace character." Outside character classes, \b matches a position before or after a word within the text source. Within character classes, \b denotes the backspace character.
\B "Not a word boundary." \B is the negation of \b, but has no alternate meaning within character classes.
\d "Digit." Matches any digit from 0-9.
\D "Not digit." Matches any character that's not a digit.
\s "Whitespace." Matches a space, newline or tab character.
\S "Not whitespace." Matches a character that's not a space, newline or tab.

The characters ^, $ and \b are called anchors, since they match a position before, after, or between characters.

Operators

Some metacharacters change how one or more of the other components in the search pattern is interpreted, i.e. they perform operations on these other components. VBA RegEx comes with three types of operators. The last one in the table is actualy a group of metacharacter expressions:

Operator Meta-character Description Example
Boolean "or" | The vertical bar denotes the boolean "or" operator. a|b matches either "a" or "b".
Grouping () Parentheses are used for several purposes: 1) to define the scope and precedence of operators.2) to group characters and remember text. h(a|e)y matches either "hay" or "hey".
Quantification ? Zero or one occurrences of the preceding element. colou?r matches both "color" and "colour".
* Zero or more occurrences of the preceding element. ab*c matches "ac", "abc", "abbc", "abbbc", and so on.
+ One or more occurrences of the preceding element. ab+c matches "abc", "abbc", "abbbc", and so on, but not "ac".
{n} The preceding item is matched exactly n times. a{3} matches "aaa".
{min,} The preceding item is matched min or more times. a{1,} matches "a", "aa", "aaa" and so on.
{min, max} The preceding item is matched at least min times, but not more than max times. a{1,3} matches "a", "aa" and "aaa", but not "aaaa".

Character classes

Character classes or character sets are specified with square brackets [ ]. Some of the most common ones are:

[a-z]

The set of lower-case letters ranging from a to z.

[A-Z]

The set of upper-case letters ranging from A to Z.

[0-9]

The set of single digits ranging from 0 to 9.

Character classes are frequently used in conjunction with operators in the search pattern. For instance, [0-5]+ translates to “find one or more digits”, rather than just a single digit in the range from zero to five.

After this rather lengthy intro on the VBA RegEx syntax essentials, let’s now turn to the VBA RegEx setup!


Write better macros in half the time
I see people struggling with Excel every day and I want to help. That's why I'm giving away my personal macro library for free. This powerful gift lets you automatically import all my macros directly into your spreadsheets with just one click.

I want to join the free wellsrPRO VBA Training program

VBA RegEx function

Start by opening the Visual Basic Editor and then go to Tools > References… and set a reference to the Microsoft VBScript Regular Expressions 5.5 Object Library.

VBA RegEx Reference
VBA RegEx Reference

Next, paste the code below into a standard code module:

Sub CallRegEx()
'Must add reference (Tools > References) to the 
'    "Microsoft VBScript Regular Expressions 5.5" Object Library
    Dim r As Match
    Dim mcolResults As MatchCollection
    Dim strInput As String, strPattern As String
        
    strInput = "The email address,'Mary-Jo.T.Williamson_01+test@test-site.com', is contained within this string"
    strPattern = "[a-z0-9-.+_]+@[a-z-]+\.[a-z]+"
    
    Set mcolResults = RegEx(strInput, strPattern, , , True)
    
    'print the returned results to the immediate window
    If Not mcolResults Is Nothing Then
        For Each r In mcolResults
            '***********************************
            'Insert your code here
            '***********************************
            Debug.Print r ' remove in production
        Next r
    End If
End Sub

Function RegEx(strInput As String, strPattern As String, _
    Optional GlobalSearch As Boolean, Optional MultiLine As Boolean, _
    Optional IgnoreCase As Boolean) As MatchCollection
    
    Dim mcolResults As MatchCollection
    Dim objRegEx As New RegExp
    
    If strPattern <> vbNullString Then
        
        With objRegEx
            .Global = GlobalSearch
            .MultiLine = MultiLine
            .IgnoreCase = IgnoreCase
            .Pattern = strPattern
        End With
    
        If objRegEx.Test(strInput) Then
            Set mcolResults = objRegEx.Execute(strInput)
            Set RegEx = mcolResults
        End If
    End If
End Function

Run the CallRegEx procedure to invoke the RegEx function.

Inside the CallRegEx procedure you specify the input text with the strInput variable and you define your search pattern in the strPattern variable. In this example, we’ve included a string containing a hypothetical email address which contains all character types allowed in valid email addresses. (Yes, plus signs are allowed). Regardless of how you rearrange this email address within the input string, our custom RegEx function will find it and return it. That is, provided you pass the correct parameters to it!

The first two parameters of our RegEx function are (1) the input string and (2) the search pattern variables we just discussed. These parameters are mandatory, and it should be self-evident why that is. Conversely, the next three parameters are optional:

Optional Parameters Type Description
Global Boolean If set to True, all matches will be returned ("greedy matching"). If set to False, only the first match will be returned ("lazy matching"). The default value is False.
IgnoreCase Boolean If set to True, the pattern matching will be case-insensitive. The default value is False (case-sensitive).
MultiLine Boolean If set to True, it changes the interpretation of the ^ ("beginning of string") and $ ("end of string") meta-characters in the search pattern so that they match the beginning and end of a line instead. The default value is False.

In the CallRegEx example, we’ve set the GlobalSearch parameter to False, but had we set it to True and had there been more email addresses in the input string, they would have all been returned in the match collection. The only optional parameter we set to True is the IgnoreCase parameter. Without it, the email search pattern in the CallRegEx procedure would have been considerably longer. As a matter of fact, this is what we would have had to type if we didn’t set the IgnoreCase parameter to True.

[a-zA-Z0-9-.+_]+@[a-zA-Z-]+\.[a-zA-Z]+


That’s pretty ugly, isn’t it? Just in case you’re new to RegEx and this search pattern is complete gobbledygook to you, let’s break it down in more detail!

The bracketed terms in the pattern above denote character sets and the plus signs at the end of them are quantifiers specifying how many characters we’re looking for. Finally, the “at” character (@) in the middle denotes a literal search for this character (In other words, we’re looking for an @ symbol). The dot (.) between the last two pair of square brackets is also searching for a literal dot, but this character has been escaped by a backslash, since the dot is a reserved symbol in RegEx syntax.

Reading from left to right, we could translate the search pattern example to English like this:

Find one or more occurrences of either a lower-case or upper-case letter ranging from a to z, a number ranging from 0 to 9, a hyphen (-), dot (.), plus (+) or an underscore sign (_)

AND

an “at” sign (@)

AND

one or more occurrences of either a lower-case or upper-case letter ranging from a to z, or a hyphen (-)

AND

a dot (.)

AND

one or more occurrences of either a lower-case or upper-case letter ranging from a to z

If (and only if) all the conditions are met and all the matching characters in the input string are contiguous and in the right order, the RegEx function will return a match. This means whitespace characters within search patterns radically alter the meaning of them. If you include a whitespace character in your search pattern by mistake the RegEx engine will look for a whitespace amid the other characters you’ve specified!


Write better macros in half the time
I see people struggling with Excel every day and I want to help. That's why I'm giving away my personal macro library for free. This powerful gift lets you automatically import all my macros directly into your spreadsheets with just one click.

I want to join the free wellsrPRO VBA Training program

Application Ideas for VBA RegEx

In this tutorial, we’ve focused on the “find” functionality of regular expressions, but it should be mentioned that they’re often used for “find and replace “ too. The latter is what the Replace method of the VBA RegEx object allows you to do. Including the “find” and “find and replace” functionalities, you can use VBA regular expressions for a wide range of advanced text processing, such as

  • Validation of user input . For instance, you could use a RegEx function for validating passwords or email addresses.
  • Finding and deleting duplicate words . Most text processing software, such as MS Word, can locate duplicate word occurrences for you, but if it’s too time-consuming to remove the duplicates manually, RegEx may be the answer.
  • Localization. In case you need to translate a text into another language, the date and number formats of the two languages might differ from one another. For instance, if the input text contains thousands of numbers, it can be well worthwhile to write a RegEx algorithm, rather than editing manually.

Even though there are many legitimate RegEx uses, there are also instances where you probably shouldn’t use this approach:

  • When other functions do the job. VBA RegEx is extremely powerful because it allows you to extract exactly the information you want from an input text, but it’s overkill if the native Excel functions and operators such as Like, Find or Replace can do the job.

  • Think twice before using RegEx to parse html. Sure, you could get lucky and get the results you want. Just realize that going this route means you could be asking for a world of hurt where you’ll keep getting weird errors that you won’t be able to fix properly. If that happens, the only stable solution to the problem is to refactor your code without RegEx. Take my word for it, I’ve been there!


Comments

If you haven’t already done so, please subscribe to my free wellsrPRO VBA Training Program. What are you waiting for? You’ll love the great VBA content I send your way!

Share this article on Google+, Twitter and Facebook, then leave a comment below and let’s have a discussion.

This article was written by Michael H. Sorensen on behalf of The VBA Tutorials Blog.

About The VBA Tutorials Blog

Ryan Wells

The VBA Tutorials Blog was created by Ryan Wells, a Nuclear Engineer and professional VBA Developer. He created this site to help others learn to write macros in Excel. That's why he developed a unique 3-part free Excel training program to help others quickly learn VBA in a natural setting: right inside Excel.