如何在Excel中使用正则表达式,并利用Excel强大的网格式设置进行数据操作?

函数返回字符串中匹配的模式或替换的值。 子遍历一列数据并提取与相邻单元格的匹配项。 需要什么设置? Excel中正则表达式的特殊字符是什么?


我知道Regex在很多情况下并不理想(使用或不使用正则表达式?),因为excel可以使用Left, Mid, Right, Instr类型的命令进行类似的操作。


当前回答

要在Excel公式中直接使用正则表达式,以下UDF(用户自定义函数)可能会有所帮助。它或多或少直接将正则表达式功能暴露为excel函数。

它是如何工作的

它需要2-3个参数。

要对其使用正则表达式的文本。 正则表达式。 指定结果外观的格式字符串。它可以包含$0、$1、$2等等。$0是整个匹配,$1及以上分别对应正则表达式中的匹配组。默认为$0。

一些例子

提取电子邮件地址:

=regex("Peter Gordon: some@email.com, 47", "\w+@\w+\.\w+")
=regex("Peter Gordon: some@email.com, 47", "\w+@\w+\.\w+", "$0")

结果:some@email.com

提取几个子字符串:

=regex("Peter Gordon: some@email.com, 47", "^(.+): (.+), (\d+)$", "E-Mail: $2, Name: $1")

搜索结果:E-Mail: some@email.com,姓名:Peter Gordon

将单个单元格中的组合字符串分解为多个单元格中的组件:

=regex("Peter Gordon: some@email.com, 47", "^(.+): (.+), (\d+)$", "$" & 1)
=regex("Peter Gordon: some@email.com, 47", "^(.+): (.+), (\d+)$", "$" & 2)

结果:彼得·戈登some@email.com…

如何使用

要使用此UDF,请执行以下操作(大致基于此Microsoft页面)。他们有一些很好的附加信息!)

In Excel in a Macro enabled file ('.xlsm') push ALT+F11 to open the Microsoft Visual Basic for Applications Editor. Add VBA reference to the Regular Expressions library (shamelessly copied from Portland Runners++ answer): Click on Tools -> References (please excuse the german screenshot) Find Microsoft VBScript Regular Expressions 5.5 in the list and tick the checkbox next to it. Click OK. Click on Insert Module. If you give your module a different name make sure the Module does not have the same name as the UDF below (e.g. naming the Module Regex and the function regex causes #NAME! errors). In the big text window in the middle insert the following: Function regex(strInput As String, matchPattern As String, Optional ByVal outputPattern As String = "$0") As Variant Dim inputRegexObj As New VBScript_RegExp_55.RegExp, outputRegexObj As New VBScript_RegExp_55.RegExp, outReplaceRegexObj As New VBScript_RegExp_55.RegExp Dim inputMatches As Object, replaceMatches As Object, replaceMatch As Object Dim replaceNumber As Integer With inputRegexObj .Global = True .MultiLine = True .IgnoreCase = False .Pattern = matchPattern End With With outputRegexObj .Global = True .MultiLine = True .IgnoreCase = False .Pattern = "\$(\d+)" End With With outReplaceRegexObj .Global = True .MultiLine = True .IgnoreCase = False End With Set inputMatches = inputRegexObj.Execute(strInput) If inputMatches.Count = 0 Then regex = False Else Set replaceMatches = outputRegexObj.Execute(outputPattern) For Each replaceMatch In replaceMatches replaceNumber = replaceMatch.SubMatches(0) outReplaceRegexObj.Pattern = "\$" & replaceNumber If replaceNumber = 0 Then outputPattern = outReplaceRegexObj.Replace(outputPattern, inputMatches(0).Value) Else If replaceNumber > inputMatches(0).SubMatches.Count Then 'regex = "A to high $ tag found. Largest allowed is $" & inputMatches(0).SubMatches.Count & "." regex = CVErr(xlErrValue) Exit Function Else outputPattern = outReplaceRegexObj.Replace(outputPattern, inputMatches(0).SubMatches(replaceNumber - 1)) End If End If Next regex = outputPattern End If End Function Save and close the Microsoft Visual Basic for Applications Editor window.

其他回答

要在Excel公式中直接使用正则表达式,以下UDF(用户自定义函数)可能会有所帮助。它或多或少直接将正则表达式功能暴露为excel函数。

它是如何工作的

它需要2-3个参数。

要对其使用正则表达式的文本。 正则表达式。 指定结果外观的格式字符串。它可以包含$0、$1、$2等等。$0是整个匹配,$1及以上分别对应正则表达式中的匹配组。默认为$0。

一些例子

提取电子邮件地址:

=regex("Peter Gordon: some@email.com, 47", "\w+@\w+\.\w+")
=regex("Peter Gordon: some@email.com, 47", "\w+@\w+\.\w+", "$0")

结果:some@email.com

提取几个子字符串:

=regex("Peter Gordon: some@email.com, 47", "^(.+): (.+), (\d+)$", "E-Mail: $2, Name: $1")

搜索结果:E-Mail: some@email.com,姓名:Peter Gordon

将单个单元格中的组合字符串分解为多个单元格中的组件:

=regex("Peter Gordon: some@email.com, 47", "^(.+): (.+), (\d+)$", "$" & 1)
=regex("Peter Gordon: some@email.com, 47", "^(.+): (.+), (\d+)$", "$" & 2)

结果:彼得·戈登some@email.com…

如何使用

要使用此UDF,请执行以下操作(大致基于此Microsoft页面)。他们有一些很好的附加信息!)

In Excel in a Macro enabled file ('.xlsm') push ALT+F11 to open the Microsoft Visual Basic for Applications Editor. Add VBA reference to the Regular Expressions library (shamelessly copied from Portland Runners++ answer): Click on Tools -> References (please excuse the german screenshot) Find Microsoft VBScript Regular Expressions 5.5 in the list and tick the checkbox next to it. Click OK. Click on Insert Module. If you give your module a different name make sure the Module does not have the same name as the UDF below (e.g. naming the Module Regex and the function regex causes #NAME! errors). In the big text window in the middle insert the following: Function regex(strInput As String, matchPattern As String, Optional ByVal outputPattern As String = "$0") As Variant Dim inputRegexObj As New VBScript_RegExp_55.RegExp, outputRegexObj As New VBScript_RegExp_55.RegExp, outReplaceRegexObj As New VBScript_RegExp_55.RegExp Dim inputMatches As Object, replaceMatches As Object, replaceMatch As Object Dim replaceNumber As Integer With inputRegexObj .Global = True .MultiLine = True .IgnoreCase = False .Pattern = matchPattern End With With outputRegexObj .Global = True .MultiLine = True .IgnoreCase = False .Pattern = "\$(\d+)" End With With outReplaceRegexObj .Global = True .MultiLine = True .IgnoreCase = False End With Set inputMatches = inputRegexObj.Execute(strInput) If inputMatches.Count = 0 Then regex = False Else Set replaceMatches = outputRegexObj.Execute(outputPattern) For Each replaceMatch In replaceMatches replaceNumber = replaceMatch.SubMatches(0) outReplaceRegexObj.Pattern = "\$" & replaceNumber If replaceNumber = 0 Then outputPattern = outReplaceRegexObj.Replace(outputPattern, inputMatches(0).Value) Else If replaceNumber > inputMatches(0).SubMatches.Count Then 'regex = "A to high $ tag found. Largest allowed is $" & inputMatches(0).SubMatches.Count & "." regex = CVErr(xlErrValue) Exit Function Else outputPattern = outReplaceRegexObj.Replace(outputPattern, inputMatches(0).SubMatches(replaceNumber - 1)) End If End If Next regex = outputPattern End If End Function Save and close the Microsoft Visual Basic for Applications Editor window.

这不是一个直接的答案,但可能会为你提供一个更有效的选择。这就是谷歌表有几个内置的正则表达式函数,这些可以非常方便,并帮助绕过Excel中的一些技术程序。显然,在个人电脑上使用Excel有一些优势,但对于大多数用户来说,谷歌表格将提供相同的体验,并可能在文档的可移植性和共享方面提供一些好处。

他们提供

REGEXEXTRACT:根据正则表达式提取匹配的子字符串。

REGEXREPLACE:使用正则表达式将文本字符串的一部分替换为不同的文本字符串。

替换:用字符串中的新文本替换现有文本。

REPLACE:用不同的文本字符串替换文本字符串的一部分。

你可以像这样直接把这些输入到单元格中,然后生成你想要的任何东西

=REGEXMATCH(A2, "[0-9]+")

它们也可以很好地与其他函数组合,如IF语句,如下所示:

=IF(REGEXMATCH(E8,"MiB"),REGEXEXTRACT(E8,"\d*\.\d*|\d*")/1000,IF(REGEXMATCH(E8,"GiB"),REGEXEXTRACT(E8,"\d*\.\d*|\d*"),"")

希望这为那些对Excel的VBS组件感到畏惧的用户提供了一个简单的解决方案。

以下是我的尝试:

Function RegParse(ByVal pattern As String, ByVal html As String)
    Dim regex   As RegExp
    Set regex = New RegExp
    
    With regex
        .IgnoreCase = True  'ignoring cases while regex engine performs the search.
        .pattern = pattern  'declaring regex pattern.
        .Global = False     'restricting regex to find only first match.
        
        If .Test(html) Then         'Testing if the pattern matches or not
            mStr = .Execute(html)(0)        '.Execute(html)(0) will provide the String which matches with Regex
            RegParse = .Replace(mStr, "$1") '.Replace function will replace the String with whatever is in the first set of braces - $1.
        Else
            RegParse = "#N/A"
        End If 
    End With
End Function

正则表达式用于模式匹配。

要在Excel中使用,请遵循以下步骤:

步骤1:添加VBA参考“Microsoft VBScript Regular Expressions 5.5”

选择“开发人员”选项卡(我没有这个选项卡,我该怎么办?) 从“代码”功能区中选择“Visual Basic”图标 在“Microsoft Visual Basic for Applications”窗口中,从顶部菜单中选择“Tools”。 选择“引用” 选中“Microsoft VBScript正则表达式5.5”旁边的复选框,将其包含在工作簿中。 单击“OK”

步骤2:定义你的模式

基本定义:

——范围。

例如,a-z匹配小写字母从a到z 例如,0-5匹配从0到5的任何数字

[]精确匹配括号内的一个对象。

例:[a]与字母a匹配 例如:[abc]匹配单个字母,可以是a, b或c 例:[a-z]匹配字母表中任何一个小写字母。

()为返回目的将不同的匹配项分组。请看下面的例子。

{}用于在它之前定义的模式的重复副本。

例如:[a]{2}匹配两个连续的小写字母a: aa 例:[a]{1,3}匹配至少一个最多三个小写字母a, aa, aaa

+匹配至少一个或多个在它之前定义的模式。

例如,a+将匹配连续的aa, aa, aaa,等等

吗?匹配0或在它之前定义的模式之一。

例:图案可能存在,也可能不存在,但只能匹配一次。 例如,[a - z] ?匹配空字符串或任何单个小写字母。

*匹配0个或多个之前定义的模式。

通配符表示可能存在也可能不存在的模式。 例如:[a-z]*匹配空字符串或小写字母字符串。

. 匹配除换行符\n以外的任何字符

例如:a.匹配以a开头,除\n之外的任何结尾的两个字符串

| OR运算符

例如,a|b表示a或b都可以匹配。 例如,红色|白色|橙色正好匹配其中一种颜色。

^ NOT运算符

例如[^0-9]字符不能包含数字 例:[^aA]字符不能是小写字母a或大写字母a

\转义后面的特殊字符(覆盖上述行为)

例如\。, \\, \(, \?, \$, \^


锚定模式:

^匹配必须发生在字符串的开头

第一个字符必须是小写字母 例:^[0-9]第一个字符必须是数字。

$ Match必须出现在字符串的末尾

例如,$最后一个字符必须是小写字母a


先后顺序表:

Order  Name                Representation
1      Parentheses         ( )
2      Multipliers         ? + * {m,n} {m, n}?
3      Sequence & Anchors  abc ^ $
4      Alternation         |

预定义字符缩写:

abr    same as       meaning
\d     [0-9]         Any single digit
\D     [^0-9]        Any single character that's not a digit
\w     [a-zA-Z0-9_]  Any word character
\W     [^a-zA-Z0-9_] Any non-word character
\s     [ \r\t\n\f]   Any space character
\S     [^ \r\t\n\f]  Any non-space character
\n     [\n]          New line

示例1:作为宏运行

下面的示例宏查看单元格A1中的值,以确定前1或2个字符是否是数字。如果是,则删除它们,并显示字符串的其余部分。如果没有,则会出现一个框,告诉您没有找到匹配项。单元格A1的值为12abc将返回abc,值为1abc将返回abc,值为abc123将返回“Not Matched”,因为数字不在字符串的开头。

Private Sub simpleRegex()
    Dim strPattern As String: strPattern = "^[0-9]{1,2}"
    Dim strReplace As String: strReplace = ""
    Dim regEx As New RegExp
    Dim strInput As String
    Dim Myrange As Range
    
    Set Myrange = ActiveSheet.Range("A1")
    
    If strPattern <> "" Then
        strInput = Myrange.Value
        
        With regEx
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
            .Pattern = strPattern
        End With
        
        If regEx.Test(strInput) Then
            MsgBox (regEx.Replace(strInput, strReplace))
        Else
            MsgBox ("Not matched")
        End If
    End If
End Sub

例2:作为单元内函数运行

此示例与示例1相同,但设置为作为单元内函数运行。要使用,将代码更改为:

Function simpleCellRegex(Myrange As Range) As String
    Dim regEx As New RegExp
    Dim strPattern As String
    Dim strInput As String
    Dim strReplace As String
    Dim strOutput As String
    
    
    strPattern = "^[0-9]{1,3}"
    
    If strPattern <> "" Then
        strInput = Myrange.Value
        strReplace = ""
        
        With regEx
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
            .Pattern = strPattern
        End With
        
        If regEx.test(strInput) Then
            simpleCellRegex = regEx.Replace(strInput, strReplace)
        Else
            simpleCellRegex = "Not matched"
        End If
    End If
End Function

将字符串(“12abc”)放在单元格A1中。在单元格B1中输入公式=simpleCellRegex(A1),结果将是“abc”。


例3:循环通过范围

这个示例与示例1相同,但循环遍历了一系列单元格。

Private Sub simpleRegex()
    Dim strPattern As String: strPattern = "^[0-9]{1,2}"
    Dim strReplace As String: strReplace = ""
    Dim regEx As New RegExp
    Dim strInput As String
    Dim Myrange As Range
    
    Set Myrange = ActiveSheet.Range("A1:A5")
    
    For Each cell In Myrange
        If strPattern <> "" Then
            strInput = cell.Value
            
            With regEx
                .Global = True
                .MultiLine = True
                .IgnoreCase = False
                .Pattern = strPattern
            End With
            
            If regEx.Test(strInput) Then
                MsgBox (regEx.Replace(strInput, strReplace))
            Else
                MsgBox ("Not matched")
            End If
        End If
    Next
End Sub

例4:分离不同的模式

这个示例循环遍历一个范围(A1, A2 & A3),并查找以三个数字开头的字符串,后面跟着一个字母字符,然后是4个数字。输出使用()将模式匹配分割为相邻的单元格。$1表示在第一个()集合中匹配的第一个模式。

Private Sub splitUpRegexPattern()
    Dim regEx As New RegExp
    Dim strPattern As String
    Dim strInput As String
    Dim Myrange As Range
    
    Set Myrange = ActiveSheet.Range("A1:A3")
    
    For Each C In Myrange
        strPattern = "(^[0-9]{3})([a-zA-Z])([0-9]{4})"
        
        If strPattern <> "" Then
            strInput = C.Value
            
            With regEx
                .Global = True
                .MultiLine = True
                .IgnoreCase = False
                .Pattern = strPattern
            End With
            
            If regEx.test(strInput) Then
                C.Offset(0, 1) = regEx.Replace(strInput, "$1")
                C.Offset(0, 2) = regEx.Replace(strInput, "$2")
                C.Offset(0, 3) = regEx.Replace(strInput, "$3")
            Else
                C.Offset(0, 1) = "(Not matched)"
            End If
        End If
    Next
End Sub

结果:


其他模式示例

String   Regex Pattern                  Explanation
a1aaa    [a-zA-Z][0-9][a-zA-Z]{3}       Single alpha, single digit, three alpha characters
a1aaa    [a-zA-Z]?[0-9][a-zA-Z]{3}      May or may not have preceding alpha character
a1aaa    [a-zA-Z][0-9][a-zA-Z]{0,3}     Single alpha, single digit, 0 to 3 alpha characters
a1aaa    [a-zA-Z][0-9][a-zA-Z]*         Single alpha, single digit, followed by any number of alpha characters

</i8>    \<\/[a-zA-Z][0-9]\>            Exact non-word character except any single alpha followed by any single digit

为那些赶时间的人扩展patszim的答案。

Open Excel workbook. Alt+F11 to open VBA/Macros window. Add reference to regex under Tools then References and selecting Microsoft VBScript Regular Expression 5.5 Insert a new module (code needs to reside in the module otherwise it doesn't work). In the newly inserted module, add the following code: Function RegxFunc(strInput As String, regexPattern As String) As String Dim regEx As New RegExp With regEx .Global = True .MultiLine = True .IgnoreCase = False .pattern = regexPattern End With If regEx.Test(strInput) Then Set matches = regEx.Execute(strInput) RegxFunc = matches(0).Value Else RegxFunc = "not matched" End If End Function The regex pattern is placed in one of the cells and absolute referencing is used on it. Function will be tied to workbook that its created in. If there's a need for it to be used in different workbooks, store the function in Personal.XLSB