How to get Excel in regex?
Excel is wonderful. Excel is life. It has everything you could possibly ever want. How could you possibly improve on perfection? You couldn't suggest anything to even slightly change my mind, I am completely...
Regex you say?
Dang, that would be pretty sweet.
Step by step regex in all your workbooks
- Download the following Excel workbook.
- Enable editing for the protected sheet.
- Save the workbook as an add-in. 1. File > Save As > Select Excel Add-in (*.xlam) 2. At this point it will automatically select the correct destination.
- Then you need to open the add-ins menu 1. File > Options (a new window will open) > Add-ins > Manage Excel Add-ins Go
- Finally enable the add-in with the same name of this sheet. If you haven't changed the name it's Regular-Expressions-Examples.
- You're ready to go! You now have regex in all your workbooks.
(Full credit to Patrick Matthews for actually writing these functions, you can find his article on VBA & regex here.)
What functions do you get?
This will then give you access to the following formulas.
- RegExpFind
- RegExpFindExtended
- RegExpFindSubmatch
- RegExpReplace
- RegExpReplaceExtended
- RegExpReplaceRange
Boy that's a lot of formulas...
I'm only going to focus on two, which most people which will cover the majority of basic use cases:
- RegExpFind
- RegExpReplace
If you just wish to copy paste, then the next section has examples.
If you'd like to dig into the ins and outs I've written out proper documentation in the section after.
Useful examples of Excel regex formulas
-
Does a cell match regex?* (case insensitive)
=ISNUMBER(RegExpFind(cell_reference, regex_goes_here, ,FALSE,1,FALSE))
-
Does a cell match regex?* (case sensitive)
=ISNUMBER(RegExpFind(cell_reference, regex_goes_here, ,TRUE,1,FALSE))
-
Extract value from string (case insensitive) (xth match)
=RegExpFind(cell_reference, regex_goes_here,x,FALSE,0,FALSE)
- This will return exactly what is matched. It does not unfortunately support capturing groups, so you're matching the xth match of the whole regex.
-
Extract value from string (case insensitive) (xth match)
=RegExpFind(cell_reference, regex_goes_here,x,TRUE,0,FALSE)
-
Replace value in string (case insensitive)
=RegExpReplace(cell_reference,regex_goes_here,replace_value,FALSE)
- Replace value in string (case sensitive) -
=RegExpReplace(cell_reference,regex_goes_here,replace_value,TRUE)
Documenting the functions
It took me a little while to clock what the functions all did, so here is some documentation to help you all out.
For those of you not familiar with this particular format of documentation, any inputs with an equals sign after them e.g. case_sensitive=FALSE
are optional. The value after the equals is the default value. ""
means empty.
RegExpFind
-
RegExpFind(search, pattern,match_to_return="", case_sensitive=TRUE, return_type=0, multiline=FALSE)
search
(str or cell ref) - The string or cell reference you want to search.pattern
(str) - The regex you want to match.-
match_to_return
(int) :- Default: Returns a zero-indexed array of all matches
- x: Returns the xth match
- 0, -1 : Returns the last match
- -x: Returns the xth to last match.
-
case_sensitive
(boolean) - If TRUE the regex is case sensitive and vice versa.- Defaults to TRUE.
-
return_type
(int) :- 0 - Return the match
- 1 - Numerical position of the first character of the match
- 2 - The length of the matches
-
multiline
(boolean) - If FALSE then ^ and $ will match the beginning and end of all the text in the cell. If TRUE, then they will match each individual line in the cell (if you've copied in an entire article for example).- Defaults to FALSE.
RegExpReplace
-
RegExpReplace(search, pattern, replace_with="", replace_all=TRUE, case_sensitive=TRUE, multiline)
search
(str or cell ref) - The string or cell reference you want to search.pattern
(str) - The regex you want to match.replace_with
(str) : The string you want to replace the match with.-
replace_all
(boolean) : If TRUE will replace all the matches, if FALSE will only replace the first.- Defaults to TRUE.
- If you need to replace the nth match you'll to use RegExpReplaceRange (which isn't documented here the moment. It is however in the raw code on Patrick Matthews post.
-
case_sensitive
(int) : If FALSE the regex is not case sensitive and vice versa.- Defaults to TRUE.
-
multiline
(boolean) - If FALSE then ^ and $ will match the beginning and end of all the text in the cell. If TRUE, then they will match each individual line in the cell (if you've copied in an entire article for example).- Defaults to FALSE.
As mention I've only looked at the two most basic functions as those will meet most peoples needs. (If you'd like more please read Patricks post and code!)