如何检查给定的字符串是否是有效的URL地址?
我对正则表达式的知识是基本的,不允许我从我已经在网上看到的数百个正则表达式中进行选择。
如何检查给定的字符串是否是有效的URL地址?
我对正则表达式的知识是基本的,不允许我从我已经在网上看到的数百个正则表达式中进行选择。
当前回答
最佳正则表达式是这里最佳答案的组合!哈哈哈!我刚刚测试了它们,并把最好的放在一起!我稍微改变了一下,只有一个捕获组!我在这个页面的源代码中找到了637个url !只有几个假阳性!
((?:(?:https?|ftp)://)(?:\S+(?::\S*)?@|\d{1,3}(?:\.\d{1,3}){3}|(?:(?:[a-z\d\x{00a1}-\x{ffff}]+-?)*[a-z\d\x{00a1}-\x{ffff}]+)(?:\.(?:[a-z\d\x{00a1}-\x{ffff}]+-?)*[a-z\d\x{00a1}-\x{ffff}]+)*(?:\.[a-z\x{00a1}-\x{ffff}]{2,6}))(?::\d+)?(?:[^\s]*)|(?:(?:(?:[A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=\+\$,\w]+@)?[A-Za-z0-9.-]+(?::[0-9]+)?|(?:www.|[-;:&=\+\$,\w]+@)[A-Za-z0-9.-]+)(?:(?:\/[\+~%\/.\w-_]*)?\??(?:[-\+=&;%@.\w_]*)#?(?:[\w]*))?)|(?:(?:(?:(?:[A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=\+\$,\w]+@)?[A-Za-z0-9.-]+|(?:www.|[-;:&=\+\$,\w]+@)[A-Za-z0-9.-]+)(?:(?:\/[\+~%\/.\w-_]*)?\??(?:[-\+=&;%@.\w_]*)#?(?:[\w]*))?))|(?:(?:(?:[\\w]+:)?//)?(?:(?:[\\d\\w]|%[a-fA-f\\d]{2,2})+(?::(?:[\\d\\w]|%[a-fA-f\\d]{2,2})+)?@)?(?:[\\d\\w][-\\d\\w]{0,253}[\\d\\w]\\.)+[\\w]{2,4}(?::[\\d]+)?(?:/(?:[-+_~.\\d\\w]|%[a-fA-f\\d]{2,2})*)*(?:\\?(?:&?(?:[-+_~.\\d\\w]|%[a-fA-f\\d]{2,2})=?)*)?(?:#(?:[-+_~.\\d\\w]|%[a-fA-f\\d]{2,2})*)?)|(?:https?:\/\/(?:[a-z0-9](?:[a-z0-9-]{0,61}[a-z0-9])?\.)+[a-z0-9][a-z0-9-]{0,61}[a-z0-9])(?::?\d*)\/?(?:[a-z_\/0-9\-#.]*)\??(?:[a-z_\/0-9\-#=&]*)|(?:(?:(?:https?:)?(?:\/?\/))(?:(?:[\d\w]|%[a-fA-f\d]{2,2})+(?::(?:[\d\w]|%[a-fA-f\d]{2,2})+)?@)?(?:[\d\w][-\d\w]{0,253}[\d\w]\.)+[\w]{2,63}(?::[\d]+)?(?:/(?:[-+_~.\d\w]|%[a-fA-f\d]{2,2})*)*(?:\?(?:&?(?:[-+_~.\d\w]|%[a-fA-f\d]{2,2})=?)*)?(?:#(?:[-+_~.\d\w]|%[a-fA-f\d]{2,2})*)?)|(?:(?:https?|ftp)://(?:www\d?|[a-zA-Z0-9]+)?\.[a-zA-Z0-9-]+(?:\:|\.)(?:[a-zA-Z0-9.]+|(?:\d+)?)(?:[/?:].*)?)|(?:\b(?:(?:https?|ftp):\/\/)(?:\S+(?::\S*)?@)?(?:(?!10(?:\.\d{1,3}){3})(?!127(?:\.\d{1,3}){3})(?!169\.254(?:\.\d{1,3}){2})(?!192\.168(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)(?:\.(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)*(?:\.(?:[a-z\x{00a1}-\x{ffff}]{2,})))(?::\d{2,5})?(?:\/[^\s]*)?\b))
其他回答
这里有一个很好的规则,它涵盖了所有可能的情况:端口,参数等等
/(https?:\/\/(?:[a-z0-9](?:[a-z0-9-]{0,61}[a-z0-9])?\.)+[a-z0-9][a-z0-9-]{0,61}[a-z0-9])(:?\d*)\/?([a-z_\/0-9\-#.]*)\??([a-z_\/0-9\-#=&]*)/g
function validateURL(textval) {
var urlregex = new RegExp(
"^(http|https|ftp)\://[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(:[a-zA-Z0-9]*)?/?([a-zA-Z0-9\-\._\?\,\'/\\\+&%\$#\=~])*$");
return urlregex.test(textval);
}
匹配 http://www.asdah.com/~joe | ftp://ftp.asdah.co.uk:2828/asdah%20asdah.gif | https://asdah.gov/asdh-ah.as
这个很适合我。(https ? | ftp): / / (www \ d ? | [a-zA-Z0-9] +) ? \ [a-zA-Z0-9。 -]+(\:|\.)([ a-zA-Z0-9) + | (\ d +)?)([/?:].*)?
function validateURL(textval) {
var urlregex = new RegExp(
"^(http|https|ftp)\://([a-zA-Z0-9\.\-]+(\:[a-zA-Z0-9\.&%\$\-]+)*@)*((25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])|localhost|([a-zA-Z0-9\-]+\.)*[a-zA-Z0-9\-]+\.(com|edu|gov|int|mil|net|org|biz|arpa|info|name|pro|aero|coop|museum|[a-zA-Z]{2}))(\:[0-9]+)*(/($|[a-zA-Z0-9\.\,\?\'\\\+&%\$#\=~_\-]+))*$");
return urlregex.test(textval);
}
匹配 http://site.com/dir/file.php?var=moo | ftp://user:pass@site.com:21/file/dir
Non-Matches site。com | http://site.com/dir//
我一直在写一篇深入的文章,讨论使用正则表达式进行URI验证。它基于RFC3986。
正则表达式URI验证
虽然这篇文章还不完整,但我已经提出了一个PHP函数,它在验证HTTP和FTP url方面做得非常好。以下是当前版本:
// function url_valid($url) { Rev:20110423_2000
//
// Return associative array of valid URI components, or FALSE if $url is not
// RFC-3986 compliant. If the passed URL begins with: "www." or "ftp.", then
// "http://" or "ftp://" is prepended and the corrected full-url is stored in
// the return array with a key name "url". This value should be used by the caller.
//
// Return value: FALSE if $url is not valid, otherwise array of URI components:
// e.g.
// Given: "http://www.jmrware.com:80/articles?height=10&width=75#fragone"
// Array(
// [scheme] => http
// [authority] => www.jmrware.com:80
// [userinfo] =>
// [host] => www.jmrware.com
// [IP_literal] =>
// [IPV6address] =>
// [ls32] =>
// [IPvFuture] =>
// [IPv4address] =>
// [regname] => www.jmrware.com
// [port] => 80
// [path_abempty] => /articles
// [query] => height=10&width=75
// [fragment] => fragone
// [url] => http://www.jmrware.com:80/articles?height=10&width=75#fragone
// )
function url_valid($url) {
if (strpos($url, 'www.') === 0) $url = 'http://'. $url;
if (strpos($url, 'ftp.') === 0) $url = 'ftp://'. $url;
if (!preg_match('/# Valid absolute URI having a non-empty, valid DNS host.
^
(?P<scheme>[A-Za-z][A-Za-z0-9+\-.]*):\/\/
(?P<authority>
(?:(?P<userinfo>(?:[A-Za-z0-9\-._~!$&\'()*+,;=:]|%[0-9A-Fa-f]{2})*)@)?
(?P<host>
(?P<IP_literal>
\[
(?:
(?P<IPV6address>
(?: (?:[0-9A-Fa-f]{1,4}:){6}
| ::(?:[0-9A-Fa-f]{1,4}:){5}
| (?: [0-9A-Fa-f]{1,4})?::(?:[0-9A-Fa-f]{1,4}:){4}
| (?:(?:[0-9A-Fa-f]{1,4}:){0,1}[0-9A-Fa-f]{1,4})?::(?:[0-9A-Fa-f]{1,4}:){3}
| (?:(?:[0-9A-Fa-f]{1,4}:){0,2}[0-9A-Fa-f]{1,4})?::(?:[0-9A-Fa-f]{1,4}:){2}
| (?:(?:[0-9A-Fa-f]{1,4}:){0,3}[0-9A-Fa-f]{1,4})?:: [0-9A-Fa-f]{1,4}:
| (?:(?:[0-9A-Fa-f]{1,4}:){0,4}[0-9A-Fa-f]{1,4})?::
)
(?P<ls32>[0-9A-Fa-f]{1,4}:[0-9A-Fa-f]{1,4}
| (?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}
(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
)
| (?:(?:[0-9A-Fa-f]{1,4}:){0,5}[0-9A-Fa-f]{1,4})?:: [0-9A-Fa-f]{1,4}
| (?:(?:[0-9A-Fa-f]{1,4}:){0,6}[0-9A-Fa-f]{1,4})?::
)
| (?P<IPvFuture>[Vv][0-9A-Fa-f]+\.[A-Za-z0-9\-._~!$&\'()*+,;=:]+)
)
\]
)
| (?P<IPv4address>(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}
(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))
| (?P<regname>(?:[A-Za-z0-9\-._~!$&\'()*+,;=]|%[0-9A-Fa-f]{2})+)
)
(?::(?P<port>[0-9]*))?
)
(?P<path_abempty>(?:\/(?:[A-Za-z0-9\-._~!$&\'()*+,;=:@]|%[0-9A-Fa-f]{2})*)*)
(?:\?(?P<query> (?:[A-Za-z0-9\-._~!$&\'()*+,;=:@\\/?]|%[0-9A-Fa-f]{2})*))?
(?:\#(?P<fragment> (?:[A-Za-z0-9\-._~!$&\'()*+,;=:@\\/?]|%[0-9A-Fa-f]{2})*))?
$
/mx', $url, $m)) return FALSE;
switch ($m['scheme']) {
case 'https':
case 'http':
if ($m['userinfo']) return FALSE; // HTTP scheme does not allow userinfo.
break;
case 'ftps':
case 'ftp':
break;
default:
return FALSE; // Unrecognized URI scheme. Default to FALSE.
}
// Validate host name conforms to DNS "dot-separated-parts".
if ($m['regname']) { // If host regname specified, check for DNS conformance.
if (!preg_match('/# HTTP DNS host name.
^ # Anchor to beginning of string.
(?!.{256}) # Overall host length is less than 256 chars.
(?: # Group dot separated host part alternatives.
[A-Za-z0-9]\. # Either a single alphanum followed by dot
| # or... part has more than one char (63 chars max).
[A-Za-z0-9] # Part first char is alphanum (no dash).
[A-Za-z0-9\-]{0,61} # Internal chars are alphanum plus dash.
[A-Za-z0-9] # Part last char is alphanum (no dash).
\. # Each part followed by literal dot.
)* # Zero or more parts before top level domain.
(?: # Explicitly specify top level domains.
com|edu|gov|int|mil|net|org|biz|
info|name|pro|aero|coop|museum|
asia|cat|jobs|mobi|tel|travel|
[A-Za-z]{2}) # Country codes are exactly two alpha chars.
\.? # Top level domain can end in a dot.
$ # Anchor to end of string.
/ix', $m['host'])) return FALSE;
}
$m['url'] = $url;
for ($i = 0; isset($m[$i]); ++$i) unset($m[$i]);
return $m; // return TRUE == array of useful named $matches plus the valid $url.
}
这个函数使用了两个正则表达式;一个用于匹配有效通用uri的子集(具有非空主机的绝对uri),另一个用于验证DNS“点分隔部分”主机名。虽然这个函数目前只验证HTTP和FTP方案,但它的结构使它可以很容易地扩展以处理其他方案。