如何使用正则表达式(Regex)在微软Excel单元和循环

如何在Excel中使用正则表达式，并利用Excel强大的网格式设置进行数据操作?

函数返回字符串中匹配的模式或替换的值。子遍历一列数据并提取与相邻单元格的匹配项。需要什么设置? Excel中正则表达式的特殊字符是什么?

我知道Regex在很多情况下并不理想(使用或不使用正则表达式?)，因为excel可以使用Left, Mid, Right, Instr类型的命令进行类似的操作。

当前回答

正则表达式用于模式匹配。

要在Excel中使用，请遵循以下步骤:

步骤1:添加VBA参考“Microsoft VBScript Regular Expressions 5.5”

选择“开发人员”选项卡(我没有这个选项卡，我该怎么办?) 从“代码”功能区中选择“Visual Basic”图标在“Microsoft Visual Basic for Applications”窗口中，从顶部菜单中选择“Tools”。选择“引用” 选中“Microsoft VBScript正则表达式5.5”旁边的复选框，将其包含在工作簿中。单击“OK”

步骤2:定义你的模式

基本定义:

——范围。

例如，a-z匹配小写字母从a到z 例如，0-5匹配从0到5的任何数字

[]精确匹配括号内的一个对象。

例:[a]与字母a匹配例如:[abc]匹配单个字母，可以是a, b或c 例:[a-z]匹配字母表中任何一个小写字母。

()为返回目的将不同的匹配项分组。请看下面的例子。

{}用于在它之前定义的模式的重复副本。

例如:[a]{2}匹配两个连续的小写字母a: aa 例:[a]{1,3}匹配至少一个最多三个小写字母a, aa, aaa

+匹配至少一个或多个在它之前定义的模式。

例如，a+将匹配连续的aa, aa, aaa，等等

吗?匹配0或在它之前定义的模式之一。

例:图案可能存在，也可能不存在，但只能匹配一次。例如,[a - z] ?匹配空字符串或任何单个小写字母。

*匹配0个或多个之前定义的模式。

通配符表示可能存在也可能不存在的模式。例如:[a-z]*匹配空字符串或小写字母字符串。

．匹配除换行符\n以外的任何字符

例如:a.匹配以a开头，除\n之外的任何结尾的两个字符串

| OR运算符

例如，a|b表示a或b都可以匹配。例如，红色|白色|橙色正好匹配其中一种颜色。

^ NOT运算符

例如[^0-9]字符不能包含数字例:[^aA]字符不能是小写字母a或大写字母a

\转义后面的特殊字符(覆盖上述行为)

例如\。， \\， \(， \?， \$， \^

锚定模式:

^匹配必须发生在字符串的开头

第一个字符必须是小写字母例:^[0-9]第一个字符必须是数字。

$ Match必须出现在字符串的末尾

例如，$最后一个字符必须是小写字母a

先后顺序表:

Order  Name                Representation
1      Parentheses         ( )
2      Multipliers         ? + * {m,n} {m, n}?
3      Sequence & Anchors  abc ^ $
4      Alternation         |

预定义字符缩写:

abr    same as       meaning
\d     [0-9]         Any single digit
\D     [^0-9]        Any single character that's not a digit
\w     [a-zA-Z0-9_]  Any word character
\W     [^a-zA-Z0-9_] Any non-word character
\s     [ \r\t\n\f]   Any space character
\S     [^ \r\t\n\f]  Any non-space character
\n     [\n]          New line

示例1:作为宏运行

下面的示例宏查看单元格A1中的值，以确定前1或2个字符是否是数字。如果是，则删除它们，并显示字符串的其余部分。如果没有，则会出现一个框，告诉您没有找到匹配项。单元格A1的值为12abc将返回abc，值为1abc将返回abc，值为abc123将返回“Not Matched”，因为数字不在字符串的开头。

Private Sub simpleRegex()
    Dim strPattern As String: strPattern = "^[0-9]{1,2}"
    Dim strReplace As String: strReplace = ""
    Dim regEx As New RegExp
    Dim strInput As String
    Dim Myrange As Range
    
    Set Myrange = ActiveSheet.Range("A1")
    
    If strPattern <> "" Then
        strInput = Myrange.Value
        
        With regEx
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
            .Pattern = strPattern
        End With
        
        If regEx.Test(strInput) Then
            MsgBox (regEx.Replace(strInput, strReplace))
        Else
            MsgBox ("Not matched")
        End If
    End If
End Sub

例2:作为单元内函数运行

此示例与示例1相同，但设置为作为单元内函数运行。要使用，将代码更改为:

Function simpleCellRegex(Myrange As Range) As String
    Dim regEx As New RegExp
    Dim strPattern As String
    Dim strInput As String
    Dim strReplace As String
    Dim strOutput As String
    
    
    strPattern = "^[0-9]{1,3}"
    
    If strPattern <> "" Then
        strInput = Myrange.Value
        strReplace = ""
        
        With regEx
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
            .Pattern = strPattern
        End With
        
        If regEx.test(strInput) Then
            simpleCellRegex = regEx.Replace(strInput, strReplace)
        Else
            simpleCellRegex = "Not matched"
        End If
    End If
End Function

将字符串(“12abc”)放在单元格A1中。在单元格B1中输入公式=simpleCellRegex(A1)，结果将是“abc”。

例3:循环通过范围

这个示例与示例1相同，但循环遍历了一系列单元格。

Private Sub simpleRegex()
    Dim strPattern As String: strPattern = "^[0-9]{1,2}"
    Dim strReplace As String: strReplace = ""
    Dim regEx As New RegExp
    Dim strInput As String
    Dim Myrange As Range
    
    Set Myrange = ActiveSheet.Range("A1:A5")
    
    For Each cell In Myrange
        If strPattern <> "" Then
            strInput = cell.Value
            
            With regEx
                .Global = True
                .MultiLine = True
                .IgnoreCase = False
                .Pattern = strPattern
            End With
            
            If regEx.Test(strInput) Then
                MsgBox (regEx.Replace(strInput, strReplace))
            Else
                MsgBox ("Not matched")
            End If
        End If
    Next
End Sub

例4:分离不同的模式

这个示例循环遍历一个范围(A1, A2 & A3)，并查找以三个数字开头的字符串，后面跟着一个字母字符，然后是4个数字。输出使用()将模式匹配分割为相邻的单元格。$1表示在第一个()集合中匹配的第一个模式。

Private Sub splitUpRegexPattern()
    Dim regEx As New RegExp
    Dim strPattern As String
    Dim strInput As String
    Dim Myrange As Range
    
    Set Myrange = ActiveSheet.Range("A1:A3")
    
    For Each C In Myrange
        strPattern = "(^[0-9]{3})([a-zA-Z])([0-9]{4})"
        
        If strPattern <> "" Then
            strInput = C.Value
            
            With regEx
                .Global = True
                .MultiLine = True
                .IgnoreCase = False
                .Pattern = strPattern
            End With
            
            If regEx.test(strInput) Then
                C.Offset(0, 1) = regEx.Replace(strInput, "$1")
                C.Offset(0, 2) = regEx.Replace(strInput, "$2")
                C.Offset(0, 3) = regEx.Replace(strInput, "$3")
            Else
                C.Offset(0, 1) = "(Not matched)"
            End If
        End If
    Next
End Sub

结果:

其他模式示例

String   Regex Pattern                  Explanation
a1aaa    [a-zA-Z][0-9][a-zA-Z]{3}       Single alpha, single digit, three alpha characters
a1aaa    [a-zA-Z]?[0-9][a-zA-Z]{3}      May or may not have preceding alpha character
a1aaa    [a-zA-Z][0-9][a-zA-Z]{0,3}     Single alpha, single digit, 0 to 3 alpha characters
a1aaa    [a-zA-Z][0-9][a-zA-Z]*         Single alpha, single digit, followed by any number of alpha characters

</i8>    \<\/[a-zA-Z][0-9]\>            Exact non-word character except any single alpha followed by any single digit

2014-03-20 19:09:13

其他回答

我需要使用它作为一个单元格函数(如SUM或VLOOKUP)，并发现它很容易:

Make sure you are in a Macro Enabled Excel File (save as xlsm). Open developer tools Alt + F11 Add Microsoft VBScript Regular Expressions 5.5 as in other answers Create the following function either in workbook or in its own module: Function REGPLACE(myRange As Range, matchPattern As String, outputPattern As String) As Variant Dim regex As New VBScript_RegExp_55.RegExp Dim strInput As String strInput = myRange.Value With regex .Global = True .MultiLine = True .IgnoreCase = False .Pattern = matchPattern End With REGPLACE = regex.Replace(strInput, outputPattern) End Function Then you can use in cell with =REGPLACE(B1, "(\w) (\d+)", "$1$2") (ex: "A 243" to "A243")

2017-03-18 02:41:11

正则表达式用于模式匹配。

要在Excel中使用，请遵循以下步骤:

步骤1:添加VBA参考“Microsoft VBScript Regular Expressions 5.5”

步骤2:定义你的模式

基本定义:

——范围。

例如，a-z匹配小写字母从a到z 例如，0-5匹配从0到5的任何数字

[]精确匹配括号内的一个对象。

例:[a]与字母a匹配例如:[abc]匹配单个字母，可以是a, b或c 例:[a-z]匹配字母表中任何一个小写字母。

()为返回目的将不同的匹配项分组。请看下面的例子。

{}用于在它之前定义的模式的重复副本。

例如:[a]{2}匹配两个连续的小写字母a: aa 例:[a]{1,3}匹配至少一个最多三个小写字母a, aa, aaa

+匹配至少一个或多个在它之前定义的模式。

例如，a+将匹配连续的aa, aa, aaa，等等

吗?匹配0或在它之前定义的模式之一。

例:图案可能存在，也可能不存在，但只能匹配一次。例如,[a - z] ?匹配空字符串或任何单个小写字母。

*匹配0个或多个之前定义的模式。

通配符表示可能存在也可能不存在的模式。例如:[a-z]*匹配空字符串或小写字母字符串。

．匹配除换行符\n以外的任何字符

例如:a.匹配以a开头，除\n之外的任何结尾的两个字符串

| OR运算符

例如，a|b表示a或b都可以匹配。例如，红色|白色|橙色正好匹配其中一种颜色。

^ NOT运算符

例如[^0-9]字符不能包含数字例:[^aA]字符不能是小写字母a或大写字母a

\转义后面的特殊字符(覆盖上述行为)

例如\。， \\， \(， \?， \$， \^

锚定模式:

^匹配必须发生在字符串的开头

第一个字符必须是小写字母例:^[0-9]第一个字符必须是数字。

$ Match必须出现在字符串的末尾

例如，$最后一个字符必须是小写字母a

先后顺序表:

Order  Name                Representation
1      Parentheses         ( )
2      Multipliers         ? + * {m,n} {m, n}?
3      Sequence & Anchors  abc ^ $
4      Alternation         |

预定义字符缩写:

abr    same as       meaning
\d     [0-9]         Any single digit
\D     [^0-9]        Any single character that's not a digit
\w     [a-zA-Z0-9_]  Any word character
\W     [^a-zA-Z0-9_] Any non-word character
\s     [ \r\t\n\f]   Any space character
\S     [^ \r\t\n\f]  Any non-space character
\n     [\n]          New line

示例1:作为宏运行

Private Sub simpleRegex()
    Dim strPattern As String: strPattern = "^[0-9]{1,2}"
    Dim strReplace As String: strReplace = ""
    Dim regEx As New RegExp
    Dim strInput As String
    Dim Myrange As Range
    
    Set Myrange = ActiveSheet.Range("A1")
    
    If strPattern <> "" Then
        strInput = Myrange.Value
        
        With regEx
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
            .Pattern = strPattern
        End With
        
        If regEx.Test(strInput) Then
            MsgBox (regEx.Replace(strInput, strReplace))
        Else
            MsgBox ("Not matched")
        End If
    End If
End Sub

例2:作为单元内函数运行

此示例与示例1相同，但设置为作为单元内函数运行。要使用，将代码更改为:

Function simpleCellRegex(Myrange As Range) As String
    Dim regEx As New RegExp
    Dim strPattern As String
    Dim strInput As String
    Dim strReplace As String
    Dim strOutput As String
    
    
    strPattern = "^[0-9]{1,3}"
    
    If strPattern <> "" Then
        strInput = Myrange.Value
        strReplace = ""
        
        With regEx
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
            .Pattern = strPattern
        End With
        
        If regEx.test(strInput) Then
            simpleCellRegex = regEx.Replace(strInput, strReplace)
        Else
            simpleCellRegex = "Not matched"
        End If
    End If
End Function

将字符串(“12abc”)放在单元格A1中。在单元格B1中输入公式=simpleCellRegex(A1)，结果将是“abc”。

例3:循环通过范围

这个示例与示例1相同，但循环遍历了一系列单元格。

Private Sub simpleRegex()
    Dim strPattern As String: strPattern = "^[0-9]{1,2}"
    Dim strReplace As String: strReplace = ""
    Dim regEx As New RegExp
    Dim strInput As String
    Dim Myrange As Range
    
    Set Myrange = ActiveSheet.Range("A1:A5")
    
    For Each cell In Myrange
        If strPattern <> "" Then
            strInput = cell.Value
            
            With regEx
                .Global = True
                .MultiLine = True
                .IgnoreCase = False
                .Pattern = strPattern
            End With
            
            If regEx.Test(strInput) Then
                MsgBox (regEx.Replace(strInput, strReplace))
            Else
                MsgBox ("Not matched")
            End If
        End If
    Next
End Sub

例4:分离不同的模式

Private Sub splitUpRegexPattern()
    Dim regEx As New RegExp
    Dim strPattern As String
    Dim strInput As String
    Dim Myrange As Range
    
    Set Myrange = ActiveSheet.Range("A1:A3")
    
    For Each C In Myrange
        strPattern = "(^[0-9]{3})([a-zA-Z])([0-9]{4})"
        
        If strPattern <> "" Then
            strInput = C.Value
            
            With regEx
                .Global = True
                .MultiLine = True
                .IgnoreCase = False
                .Pattern = strPattern
            End With
            
            If regEx.test(strInput) Then
                C.Offset(0, 1) = regEx.Replace(strInput, "$1")
                C.Offset(0, 2) = regEx.Replace(strInput, "$2")
                C.Offset(0, 3) = regEx.Replace(strInput, "$3")
            Else
                C.Offset(0, 1) = "(Not matched)"
            End If
        End If
    Next
End Sub

结果:

其他模式示例

String   Regex Pattern                  Explanation
a1aaa    [a-zA-Z][0-9][a-zA-Z]{3}       Single alpha, single digit, three alpha characters
a1aaa    [a-zA-Z]?[0-9][a-zA-Z]{3}      May or may not have preceding alpha character
a1aaa    [a-zA-Z][0-9][a-zA-Z]{0,3}     Single alpha, single digit, 0 to 3 alpha characters
a1aaa    [a-zA-Z][0-9][a-zA-Z]*         Single alpha, single digit, followed by any number of alpha characters

</i8>    \<\/[a-zA-Z][0-9]\>            Exact non-word character except any single alpha followed by any single digit

2014-03-20 19:09:13

下面是一个regex_subst()函数。例子:

=regex_subst("watermellon", "[aeiou]", "")
---> wtrmlln
=regex_subst("watermellon", "[^aeiou]", "")
---> aeeo

下面是简化的代码(至少对我来说更简单)。我不知道如何使用上面的例子来构建一个合适的输出模式:

Function regex_subst( _
     strInput As String _
   , matchPattern As String _
   , Optional ByVal replacePattern As String = "" _
) As Variant
    Dim inputRegexObj As New VBScript_RegExp_55.RegExp

    With inputRegexObj
        .Global = True
        .MultiLine = True
        .IgnoreCase = False
        .Pattern = matchPattern
    End With

    regex_subst = inputRegexObj.Replace(strInput, replacePattern)
End Function

2017-07-28 21:19:35

我不想启用一个参考库，因为我需要我的脚本是可移植的。Dim foo作为新的VBScript_RegExp_55。RegExp行导致了User Defined Type Not Defined错误，但我找到了一个适合我的解决方案。

更新RE评论w/ @chrisneilsen:

I was under the impression that enabling a reference library was tied to the local computers settings, but it is in fact, tied directly to the workbook. So, you can enable a reference library, share a macro enabled workbook and the end user wouldn't have to enable the library as well. Caveat: The advantage to Late Binding is that the developer does not have to worry about the wrong version of an object library being installed on the user's computer. This likely would not be an issue w/ the VBScript_RegExp_55.RegExp library, but I'm not sold that the "performance" benifit is worth it for me at this time, as we are talking imperceptible milliseconds in my code. I felt this deserved an update to help others understand. If you enable the reference library, you can use "early bind", but if you don't, as far as I can tell, the code will work fine, but you need to "late bind" and loose on some performance/debugging features.

来源:https://peltiertech.com/Excel/EarlyLateBinding.html

您要做的是在单元格A1中放置一个示例字符串，然后测试您的strPattern。一旦工作，调整然后rng按需要。

Public Sub RegExSearch()
'https://stackoverflow.com/questions/22542834/how-to-use-regular-expressions-regex-in-microsoft-excel-both-in-cell-and-loops
'https://wellsr.com/vba/2018/excel/vba-regex-regular-expressions-guide/
'https://www.vitoshacademy.com/vba-regex-in-excel/
    Dim regexp As Object
    'Dim regex As New VBScript_RegExp_55.regexp 'Caused "User Defined Type Not Defined" Error
    Dim rng As Range, rcell As Range
    Dim strInput As String, strPattern As String
    
    Set regexp = CreateObject("vbscript.regexp")
    Set rng = ActiveSheet.Range("A1:A1")
        
    strPattern = "([a-z]{2})([0-9]{8})"
    'Search for 2 Letters then 8 Digits Eg: XY12345678 = Matched

    With regexp
        .Global = False
        .MultiLine = False
        .ignoreCase = True
        .Pattern = strPattern
    End With

    For Each rcell In rng.Cells

        If strPattern <> "" Then
            strInput = rcell.Value

            If regexp.test(strInput) Then
                MsgBox rcell & " Matched in Cell " & rcell.Address
            Else
                MsgBox "No Matches!"
            End If
        End If
    Next
End Sub

2019-03-22 02:57:02

为了增加有价值的内容，我想创建一个关于为什么有时VBA中的RegEx不是理想的提醒。并不是支持所有的表达式，而是可能抛出一个Error 5017，并可能让作者猜测(这是我自己的受害者)。

虽然我们可以找到一些关于什么是支持的来源，但知道哪些元字符等是不支持的将是有帮助的。更深入的解释可以在这里找到。该资料中提到:

"Although "VBScript正则表达式…5.5版本实现了很多基本的正则表达式特性，这些特性在以前版本的VBScript. ...中是没有的JavaScript和VBScript实现了perl风格的正则表达式。然而，它们缺乏Perl和其他现代正则表达式中提供的大量高级特性:

因此，不支持的有:

Start of String ancor \A, alternatively use the ^ caret to match postion before 1st char in string End of String ancor \Z, alternatively use the $ dollar sign to match postion after last char in string Positive LookBehind, e.g.: (?<=a)b (whilst postive LookAhead is supported) Negative LookBehind, e.g.: (?<!a)b (whilst negative LookAhead is supported) Atomic Grouping Possessive Quantifiers Unicode e.g.: \{uFFFF} Named Capturing Groups. Alternatively use Numbered Capturing Groups Inline modifiers, e.g.: /i (case sensitivity) or /g (global) etc. Set these through the RegExp object properties > RegExp.Global = True and RegExp.IgnoreCase = True if available. Conditionals Regular Expression Comments. Add these with regular ' comments in script

我已经在VBA中使用正则表达式不止一次地碰壁了。通常使用LookBehind，但有时我甚至忘记修饰符。我自己没有经历过上面提到的所有这些背景，但我认为我应该参考一些更深入的信息。请随意评论/更正/补充。正则表达式.info提供了丰富的信息。

附注:你提到了常规的VBA方法和函数，我可以确认它们(至少对我自己)在RegEx失败的地方以自己的方式提供了帮助。

2020-02-12 16:19:40

如何使用正则表达式(Regex)在微软Excel单元和循环

推荐文章

最新文章

标签