谁有一个正则表达式,可以匹配任何合法的DNS主机名或IP地址?
编写一个95%的工作时间很容易,但我希望得到一个经过良好测试的东西,完全匹配DNS主机名的最新RFC规范。
谁有一个正则表达式,可以匹配任何合法的DNS主机名或IP地址?
编写一个95%的工作时间很容易,但我希望得到一个经过良好测试的东西,完全匹配DNS主机名的最新RFC规范。
当前回答
新的网络框架为结构IPv4Address和结构IPv6Address提供了可失败的初始化器,可以很容易地处理IP地址部分。在IPv6中使用regex实现这一点很困难,因为所有的缩短规则。
不幸的是,对于主机名,我没有一个优雅的答案。
注意,网络框架是最近的,所以它可能会强迫你编译最新的操作系统版本。
import Network
let tests = ["192.168.4.4","fkjhwojfw","192.168.4.4.4","2620:3","2620::33"]
for test in tests {
if let _ = IPv4Address(test) {
debugPrint("\(test) is valid ipv4 address")
} else if let _ = IPv6Address(test) {
debugPrint("\(test) is valid ipv6 address")
} else {
debugPrint("\(test) is not a valid IP address")
}
}
output:
"192.168.4.4 is valid ipv4 address"
"fkjhwojfw is not a valid IP address"
"192.168.4.4.4 is not a valid IP address"
"2620:3 is not a valid IP address"
"2620::33 is valid ipv6 address"
其他回答
我似乎无法编辑顶部的帖子,所以我将在这里添加我的答案。
对于主机名——简单答案,在这里的egrep示例中——http: //www.linuxinsight.com/how_to_grep_for_ip_addresses_using_the_gnu_egrep_utility.html
egrep '([[:digit:]]{1,3}\.){3}[[:digit:]]{1,3}'
尽管这种情况不包括拳头八位元中的0,以及大于254 (ip地址)或255(网络掩码)的值。也许附加一个if语句会有所帮助。
至于合法的dns主机名,如果你只检查互联网主机名(而不是内部网),我写了下面的剪辑,shell/php的混合,但它应该适用于任何正则表达式。
首先去ietf网站,下载并解析一个合法的一级域名列表:
tld=$(curl -s http://data.iana.org/TLD/tlds-alpha-by-domain.txt | sed 1d | cut -f1 -d'-' | tr '\n' '|' | sed 's/\(.*\)./\1/')
echo "($tld)"
这应该给你一个很好的重新代码,检查顶级域名的合法性,如。com .org或。ca
然后根据这里找到的准则添加表达式的第一部分——http: //www.domainit.com/support/faq.mhtml?category=Domain_FAQ&question=9(任何字母数字组合和'-'符号,破折号不应该出现在八位体的开头或结尾。
(([a-z0-9]+|([a-z0-9]+[-]+[a-z0-9]+))[.])+
然后把它们放在一起(PHP preg_match的例子):
$pattern = '/^(([a-z0-9]+|([a-z0-9]+[-]+[a-z0-9]+))[.])+(AC|AD|AE|AERO|AF|AG|AI|AL|AM|AN|AO|AQ|AR|ARPA|AS|ASIA|AT|AU|AW|AX|AZ|BA|BB|BD|BE|BF|BG|BH|BI|BIZ|BJ|BM|BN|BO|BR|BS|BT|BV|BW|BY|BZ|CA|CAT|CC|CD|CF|CG|CH|CI|CK|CL|CM|CN|CO|COM|COOP|CR|CU|CV|CX|CY|CZ|DE|DJ|DK|DM|DO|DZ|EC|EDU|EE|EG|ER|ES|ET|EU|FI|FJ|FK|FM|FO|FR|GA|GB|GD|GE|GF|GG|GH|GI|GL|GM|GN|GOV|GP|GQ|GR|GS|GT|GU|GW|GY|HK|HM|HN|HR|HT|HU|ID|IE|IL|IM|IN|INFO|INT|IO|IQ|IR|IS|IT|JE|JM|JO|JOBS|JP|KE|KG|KH|KI|KM|KN|KP|KR|KW|KY|KZ|LA|LB|LC|LI|LK|LR|LS|LT|LU|LV|LY|MA|MC|MD|ME|MG|MH|MIL|MK|ML|MM|MN|MO|MOBI|MP|MQ|MR|MS|MT|MU|MUSEUM|MV|MW|MX|MY|MZ|NA|NAME|NC|NE|NET|NF|NG|NI|NL|NO|NP|NR|NU|NZ|OM|ORG|PA|PE|PF|PG|PH|PK|PL|PM|PN|PR|PRO|PS|PT|PW|PY|QA|RE|RO|RS|RU|RW|SA|SB|SC|SD|SE|SG|SH|SI|SJ|SK|SL|SM|SN|SO|SR|ST|SU|SV|SY|SZ|TC|TD|TEL|TF|TG|TH|TJ|TK|TL|TM|TN|TO|TP|TR|TRAVEL|TT|TV|TW|TZ|UA|UG|UK|US|UY|UZ|VA|VC|VE|VG|VI|VN|VU|WF|WS|XN|XN|XN|XN|XN|XN|XN|XN|XN|XN|XN|YE|YT|YU|ZA|ZM|ZW)[.]?$/i';
if (preg_match, $pattern, $matching_string){
... do stuff
}
您可能还想添加一个if语句来检查要检查的字符串是否小于256个字符——http://www.ops.ietf.org/lists/namedroppers/namedroppers.2003/msg00964.html
我考虑过这个简单的正则表达式匹配模式来进行IP地址匹配 \ d + [] \ d + [] \ d + [] \ d +
新的网络框架为结构IPv4Address和结构IPv6Address提供了可失败的初始化器,可以很容易地处理IP地址部分。在IPv6中使用regex实现这一点很困难,因为所有的缩短规则。
不幸的是,对于主机名,我没有一个优雅的答案。
注意,网络框架是最近的,所以它可能会强迫你编译最新的操作系统版本。
import Network
let tests = ["192.168.4.4","fkjhwojfw","192.168.4.4.4","2620:3","2620::33"]
for test in tests {
if let _ = IPv4Address(test) {
debugPrint("\(test) is valid ipv4 address")
} else if let _ = IPv6Address(test) {
debugPrint("\(test) is valid ipv6 address")
} else {
debugPrint("\(test) is not a valid IP address")
}
}
output:
"192.168.4.4 is valid ipv4 address"
"fkjhwojfw is not a valid IP address"
"192.168.4.4.4 is not a valid IP address"
"2620:3 is not a valid IP address"
"2620::33 is valid ipv6 address"
smink的主机名正则表达式没有遵守主机名中各个标签长度的限制。有效主机名中的每个标签长度不能超过63个字节。
ValidHostnameRegex="^([a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9\-]{0,61}[a-zA-Z0-9])\ (\.([a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9\-]{0,61}[a-zA-Z0-9]))*$"
请注意,第一行末尾的反斜杠(上面)是用于分隔长行的Unix shell语法。它不是正则表达式本身的一部分。
下面是一行中单独的正则表达式:
^([a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9\-]{0,61}[a-zA-Z0-9])(\.([a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9\-]{0,61}[a-zA-Z0-9]))*$
您还应该单独检查主机名的总长度不能超过255个字符。更多信息,请咨询RFC-952和RFC-1123。
Regarding IP addresses, it appears that there is some debate on whether to include leading zeros. It was once the common practice and is generally accepted, so I would argue that they should be flagged as valid regardless of the current preference. There is also some ambiguity over whether text before and after the string should be validated and, again, I think it should. 1.2.3.4 is a valid IP but 1.2.3.4.5 is not and neither the 1.2.3.4 portion nor the 2.3.4.5 portion should result in a match. Some of the concerns can be handled with this expression:
grep -E '(^|[^[:alnum:]+)(([0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])\.){3}([0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])([^[:alnum:]]|$)'
The unfortunate part here is the fact that the regex portion that validates an octet is repeated as is true in many offered solutions. Although this is better than for instances of the pattern, the repetition can be eliminated entirely if subroutines are supported in the regex being used. The next example enables those functions with the -P switch of grep and also takes advantage of lookahead and lookbehind functionality. (The function name I selected is 'o' for octet. I could have used 'octet' as the name but wanted to be terse.)
grep -P '(?<![\d\w\.])(?<o>([0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5]))(\.\g<o>){3}(?![\d\w\.])'
如果IP地址在一个包含句子形式文本的文件中,那么点号的处理实际上可能会产生错误的否定,因为句号可以跟在后面,而不是点号符号的一部分。上面的一个变体可以修复这个问题:
grep -P '(?<![\d\w\.])(?<x>([0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5]))(\.\g<x>){3}(?!([\d\w]|\.\d))'