Excel's REGEX Functions: A Practical Guide to REGEXTEST, REGEXEXTRACT, and REGEXREPLACE
For most of Excel's history, parsing irregular text meant chaining LEFT, RIGHT, MID, FIND, SEARCH, SUBSTITUTE, and a stack of IF statements. The formulas were long, fragile, and always one edge case away from breaking. Anyone who has ever cleaned a column of mixed-format phone numbers or extracted invoice references from a free-text field knows the feeling.
That changed when Microsoft added three regular expression functions to Excel: REGEXTEST, REGEXEXTRACT, and REGEXREPLACE. They went into the Insiders channel in May 2024 and reached general availability across Microsoft 365 over the following year. They use PCRE2, the same regex flavor as regex101.com and most modern programming languages, which means patterns built and tested elsewhere drop into Excel directly.
This article covers what each function does, the patterns that come up most in actual spreadsheets, the things they are and are not good at, and a few non-obvious behaviours worth knowing before you build a workbook around them.
Which Excel Versions Have REGEX
The three functions are available in Microsoft 365 (Excel for the web, Windows, Mac) and Excel for iPad. They are not in perpetual-licence Excel 2021 or earlier. If you type =REGEXTEST( and the formula bar shows #NAME? after you confirm, your version does not have them.
For shared workbooks where some collaborators are on perpetual licences, the formulas will simply return #NAME? on the older version. There is no graceful fallback inside the file itself. The pragmatic options are to either standardise the team on Microsoft 365 or to keep the regex work in a separate sheet that older clients are not expected to interact with.
REGEXTEST: A Boolean Match Check
Signature: =REGEXTEST(text, pattern, [case_sensitivity])
REGEXTEST returns TRUE if the pattern matches anywhere in the text and FALSE if it does not. The third argument is optional and controls case sensitivity. The default is case-insensitive on the Mac and Windows builds at the time of writing, which is the opposite of most regex engines, so check before relying on default behaviour.
This is the function that replaces the largest amount of legacy spreadsheet code. Anywhere you previously wrote =ISNUMBER(SEARCH("xyz", A2)) or stacked IFs to check whether a value matched several patterns, REGEXTEST collapses the logic to a single call.
A few patterns that come up constantly:
=REGEXTEST(A2, "^\d{5}(-\d{4})?$") validates a US zip code in either the five-digit or nine-digit format.
=REGEXTEST(A2, "^[\w.-]+@[\w.-]+\.\w{2,}$") does a serviceable email format check. Note the deliberate choice of "serviceable" rather than "complete." Real email validation requires far more than a regex, and the practical recommendation is to do basic format checking in the spreadsheet and rely on the actual mail system for anything that has to be deliverable.
=REGEXTEST(A2, "^(?=.*[A-Z])(?=.*\d).{8,}$") checks that a password meets a basic policy: at least eight characters, with at least one uppercase letter and one digit. Useful for vetting an export of user records before passing them to a system that enforces the policy.
The function is also the cleanest way to power conditional formatting. Set the rule to "Use a formula to determine which cells to format" and write =REGEXTEST(A2, pattern). The cells that match get the formatting. This is materially better than the equivalent ISNUMBER(SEARCH(...)) approach because the regex can describe the pattern instead of approximating it.
REGEXEXTRACT: Pulling Substrings Out of Text
Signature: =REGEXEXTRACT(text, pattern, [return_mode], [case_sensitivity])
REGEXEXTRACT is the function for the case where a string contains a piece of information you want, surrounded by content you do not. The return_mode argument controls how it gives back results.
Return mode 0 (the default) returns the first match as a single value. Use this when there is exactly one thing you are extracting from each cell.
Return mode 1 returns all matches as a spilled vertical array. Use this when one cell contains multiple instances of the same pattern and you want them as separate rows.
Return mode 2 returns the capture groups from the first match as a spilled horizontal array. Use this when a single string contains several distinct fields, each captured by its own group, that you want as separate columns.
A few examples that map to real spreadsheet work:
Extracting the numeric portion of an SKU: =REGEXEXTRACT(A2, "\d+") returns the first run of digits. If your SKUs look like "SKU-12345-RED", this returns "12345" as text. Wrap with VALUE() if you need a number.
Extracting all dollar amounts from a free-text note: =REGEXEXTRACT(A2, "\$[\d,]+(\.\d{2})?", 1) with return mode 1 spills every match into rows below the formula. Useful for summarising expenses written into a notes column.
Splitting a fully formatted name into its parts: =REGEXEXTRACT(A2, "^(\S+)\s+(.*?)\s+(\S+)$", 2) with return mode 2 captures first name, middle, and last name into three adjacent cells in one formula. The pattern is naive, in the way name patterns always are, but the technique generalises.
The most common mistake with REGEXEXTRACT is forgetting that it returns text. If you extract digits and need to do arithmetic on them, wrap the call in VALUE or use the unary minus prefix to coerce. Without coercion, downstream SUMs silently treat the extracted strings as zero.
REGEXREPLACE: Rule-Based Find and Replace
Signature: =REGEXREPLACE(text, pattern, replacement, [occurrence], [case_sensitivity])
REGEXREPLACE is the function that turns a column of inconsistent text into a column of consistent text. The first three arguments are the obvious ones. The fourth argument is the occurrence to replace: leave it blank to replace all matches, set it to 1 to replace only the first, 2 for only the second, and so on.
Capture groups in the replacement string are referenced as $1, $2, $3, and so on. The literal string $0 refers to the entire match. This is the same convention as JavaScript and most other modern regex engines.
Patterns that come up often:
Stripping non-digits from phone numbers: =REGEXREPLACE(A2, "\D", "") removes every character that is not a digit. The result is the bare phone number suitable for a database key, regardless of whether the original was "(555) 123-4567" or "555.123.4567" or "+1 555-123-4567".
Normalising whitespace: =REGEXREPLACE(TRIM(A2), "\s+", " ") collapses runs of multiple spaces, tabs, and other whitespace characters into a single space. The TRIM handles leading and trailing whitespace; the regex handles internal runs. The two together give you clean text in one expression.
Reformatting dates: =REGEXREPLACE(A2, "(\d{4})-(\d{2})-(\d{2})", "$3/$2/$1") rewrites an ISO-format date as DD/MM/YYYY by capturing the three components and reassembling them in a different order. This is useful for one-off reformatting jobs where converting to and from a real date type would be more work than the rewrite.
Masking sensitive data: =REGEXREPLACE(A2, "\d(?=\d{4})", "*") masks all but the last four digits of a numeric string. Each digit followed by at least four more digits is replaced with an asterisk. The lookahead is what makes this work without consuming the trailing digits.
What Makes the Excel Implementation Specific
Three details of the Excel implementation are worth knowing because they catch people coming from other regex environments.
First, the functions use PCRE2 specifically. PCRE2 supports almost everything you would expect from Perl-compatible regex, including lookaheads, lookbehinds, named groups, and Unicode properties. If a pattern works on regex101.com with the PCRE2 flavor selected, it should work in Excel.
Second, all three functions accept arrays. If you pass a range to the text argument, the function evaluates against each cell and spills the results. =REGEXTEST(A2:A100, pattern) returns a 99-row array of TRUE and FALSE values, which can feed FILTER or SUMPRODUCT. This is useful and not always advertised. It also means you do not need to wrap the call in any spill helper to get array behaviour.
Third, the case sensitivity argument is the third positional argument and uses 0 for case-sensitive, 1 for case-insensitive. The default behaviour differs across Microsoft documentation and platform, so the safe practice is to set it explicitly when case matters. If your pattern itself uses (?i) or (?-i) flag groups, those override the function argument for the parts of the pattern they apply to.
Where REGEX Functions Are Not the Answer
Like any tool, the regex functions have limits. Three honest cases where they are the wrong choice:
Highly structured data with reliable delimiters does not need regex. If your column is comma-separated and you want the third field, TEXTSPLIT or TEXTBEFORE/TEXTAFTER are simpler and more readable than the equivalent regex pattern. Regex earns its keep on irregular text. On regular text, simpler functions win.
Recursive or context-sensitive parsing is beyond regex regardless of flavor. Matching balanced parentheses in arbitrarily nested expressions, validating actual XML or JSON structure, or anything where the meaning of a token depends on tokens far away in the string is a job for a parser, not a pattern. Excel does not have a built-in parser, but Power Query handles many of these cases with M, and Python in Excel handles the rest.
Patterns that have to be edited frequently by non-technical colleagues are a poor fit for regex. The syntax is dense, the failure modes are silent, and a misplaced character can flip an entire column from correct to wrong without warning. If a workbook will be maintained by people who do not write regex regularly, prefer named columns of cleanup rules and a Power Query step over a single regex formula that nobody else can debug.
Combining REGEX with Other Functions
The regex functions are most powerful in combination with the rest of Excel's modern formula language.
REGEXTEST inside FILTER lets you pull rows by pattern: =FILTER(A2:C100, REGEXTEST(A2:A100, "^INV-2026")) returns only the invoice rows for 2026. The same pattern with SUMIFS or COUNTIFS would require an awkward wildcard, which does not support the full regex syntax.
REGEXEXTRACT inside LET lets you parse a string once and refer to the parts by name. We covered LET in detail in our LET function guide, and the combination with regex is one of the cases where LET pays off most clearly.
REGEXREPLACE chained through multiple steps lets you build a small cleanup pipeline without leaving the formula bar. The first call strips one kind of noise, the second normalises whitespace, the third applies a final format. Each step is auditable on its own.
A Note on AI and Regex
The single most common practical use of AI for spreadsheet work in 2026 is asking a model to write a regex pattern from a natural-language description. Tools like ChatGPT and Claude are reliably good at this. Describe the pattern you want, paste a few representative inputs, ask for a PCRE2-compatible regex, and verify the output on regex101.com before dropping it into Excel.
This is one of the rare cases where AI removes a bottleneck that was almost entirely about syntax recall. Regex syntax is dense and easy to forget between uses. Asking a model to translate "match a UK postcode" or "extract the first quoted phrase" produces a pattern in seconds. Just verify with real test data, because models occasionally produce patterns that match the examples you gave but fail on edge cases you did not.
For more on the discipline of evaluating AI output before relying on it, our sister site has a guide on evaluating AI answers when you are not the expert that applies almost directly here.
Where REGEX Earns Its Keep
REGEXTEST replaces validation logic. REGEXEXTRACT replaces extraction logic. REGEXREPLACE replaces cleanup logic. In each category, the regex versions are usually shorter, more flexible, and easier to maintain than the LEFT/RIGHT/MID/FIND chains they replace.
The functions are not a magic wand. They reward thinking about the actual structure of your data and writing a pattern that captures that structure precisely. They punish guessing, in the sense that a pattern that works on the first ten rows but fails on row 117 is just as visible as one that fails everywhere.
For most teams, the right time to learn the three functions is the next time you find yourself writing a chain of three or more legacy text functions to do something that feels like pattern matching. That is the moment regex turns a forty-character formula into a ten-character one and a fragile workbook into a stable one.