如何检查给定的字符串是否是有效的URL地址?
我对正则表达式的知识是基本的,不允许我从我已经在网上看到的数百个正则表达式中进行选择。
如何检查给定的字符串是否是有效的URL地址?
我对正则表达式的知识是基本的,不允许我从我已经在网上看到的数百个正则表达式中进行选择。
当前回答
最佳正则表达式是这里最佳答案的组合!哈哈哈!我刚刚测试了它们,并把最好的放在一起!我稍微改变了一下,只有一个捕获组!我在这个页面的源代码中找到了637个url !只有几个假阳性!
((?:(?:https?|ftp)://)(?:\S+(?::\S*)?@|\d{1,3}(?:\.\d{1,3}){3}|(?:(?:[a-z\d\x{00a1}-\x{ffff}]+-?)*[a-z\d\x{00a1}-\x{ffff}]+)(?:\.(?:[a-z\d\x{00a1}-\x{ffff}]+-?)*[a-z\d\x{00a1}-\x{ffff}]+)*(?:\.[a-z\x{00a1}-\x{ffff}]{2,6}))(?::\d+)?(?:[^\s]*)|(?:(?:(?:[A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=\+\$,\w]+@)?[A-Za-z0-9.-]+(?::[0-9]+)?|(?:www.|[-;:&=\+\$,\w]+@)[A-Za-z0-9.-]+)(?:(?:\/[\+~%\/.\w-_]*)?\??(?:[-\+=&;%@.\w_]*)#?(?:[\w]*))?)|(?:(?:(?:(?:[A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=\+\$,\w]+@)?[A-Za-z0-9.-]+|(?:www.|[-;:&=\+\$,\w]+@)[A-Za-z0-9.-]+)(?:(?:\/[\+~%\/.\w-_]*)?\??(?:[-\+=&;%@.\w_]*)#?(?:[\w]*))?))|(?:(?:(?:[\\w]+:)?//)?(?:(?:[\\d\\w]|%[a-fA-f\\d]{2,2})+(?::(?:[\\d\\w]|%[a-fA-f\\d]{2,2})+)?@)?(?:[\\d\\w][-\\d\\w]{0,253}[\\d\\w]\\.)+[\\w]{2,4}(?::[\\d]+)?(?:/(?:[-+_~.\\d\\w]|%[a-fA-f\\d]{2,2})*)*(?:\\?(?:&?(?:[-+_~.\\d\\w]|%[a-fA-f\\d]{2,2})=?)*)?(?:#(?:[-+_~.\\d\\w]|%[a-fA-f\\d]{2,2})*)?)|(?:https?:\/\/(?:[a-z0-9](?:[a-z0-9-]{0,61}[a-z0-9])?\.)+[a-z0-9][a-z0-9-]{0,61}[a-z0-9])(?::?\d*)\/?(?:[a-z_\/0-9\-#.]*)\??(?:[a-z_\/0-9\-#=&]*)|(?:(?:(?:https?:)?(?:\/?\/))(?:(?:[\d\w]|%[a-fA-f\d]{2,2})+(?::(?:[\d\w]|%[a-fA-f\d]{2,2})+)?@)?(?:[\d\w][-\d\w]{0,253}[\d\w]\.)+[\w]{2,63}(?::[\d]+)?(?:/(?:[-+_~.\d\w]|%[a-fA-f\d]{2,2})*)*(?:\?(?:&?(?:[-+_~.\d\w]|%[a-fA-f\d]{2,2})=?)*)?(?:#(?:[-+_~.\d\w]|%[a-fA-f\d]{2,2})*)?)|(?:(?:https?|ftp)://(?:www\d?|[a-zA-Z0-9]+)?\.[a-zA-Z0-9-]+(?:\:|\.)(?:[a-zA-Z0-9.]+|(?:\d+)?)(?:[/?:].*)?)|(?:\b(?:(?:https?|ftp):\/\/)(?:\S+(?::\S*)?@)?(?:(?!10(?:\.\d{1,3}){3})(?!127(?:\.\d{1,3}){3})(?!169\.254(?:\.\d{1,3}){2})(?!192\.168(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)(?:\.(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)*(?:\.(?:[a-z\x{00a1}-\x{ffff}]{2,})))(?::\d{2,5})?(?:\/[^\s]*)?\b))
其他回答
黑(http / \ /处事之道会(s) ?):地球,随便(www。)? a-zA-Z0-9 @:%._\+~#=]{ 地球,随便2,256出于美观。黑a-z铝可不,2、6出于美观\ b(黑-a-zA-Z0-9 @:%_\+.~#?&//=]*)
关于眼睑问题,他的回答是“这是基于我对URI规范的阅读。”谢谢眼睑,你的是我寻求的完美的解决方案,因为它是基于URI规范!出色的工作。:)
我不得不做了两个修改。第一个在PHP (v5.2.10)中使用preg_match()函数使regexp正确匹配IP地址url的方法。
我不得不在管道周围的“IP Address”上方的行中添加一组括号:
)|((\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5])\.){3}(?#
不知道为什么。
我还将顶级域名的最小长度从3个字母减少到2个字母,以支持.co。英国和类似国家。
最后的代码:
/^(https?|ftp):\/\/(?# protocol
)(([a-z0-9$_\.\+!\*\'\(\),;\?&=-]|%[0-9a-f]{2})+(?# username
)(:([a-z0-9$_\.\+!\*\'\(\),;\?&=-]|%[0-9a-f]{2})+)?(?# password
)@)?(?# auth requires @
)((([a-z0-9]\.|[a-z0-9][a-z0-9-]*[a-z0-9]\.)*(?# domain segments AND
)[a-z][a-z0-9-]*[a-z0-9](?# top level domain OR
)|((\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5])\.){3}(?#
)(\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5])(?# IP address
))(:\d+)?(?# port
))(((\/+([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)*(?# path
)(\?([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)(?# query string
)?)?)?(?# path and query string optional
)(#([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)?(?# fragment
)$/i
这个修改后的版本没有根据URI规范进行检查,所以我不能保证它的合规性,它被修改为处理本地网络环境中的URL和两位tld以及其他类型的Web URL,并在我使用的PHP设置中更好地工作。
作为PHP代码:
define('URL_FORMAT',
'/^(https?):\/\/'. // protocol
'(([a-z0-9$_\.\+!\*\'\(\),;\?&=-]|%[0-9a-f]{2})+'. // username
'(:([a-z0-9$_\.\+!\*\'\(\),;\?&=-]|%[0-9a-f]{2})+)?'. // password
'@)?(?#'. // auth requires @
')((([a-z0-9]\.|[a-z0-9][a-z0-9-]*[a-z0-9]\.)*'. // domain segments AND
'[a-z][a-z0-9-]*[a-z0-9]'. // top level domain OR
'|((\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5])\.){3}'.
'(\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5])'. // IP address
')(:\d+)?'. // port
')(((\/+([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)*'. // path
'(\?([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)'. // query string
'?)?)?'. // path and query string optional
'(#([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)?'. // fragment
'$/i');
下面是一个PHP测试程序,它使用正则表达式验证各种url:
<?php
define('URL_FORMAT',
'/^(https?):\/\/'. // protocol
'(([a-z0-9$_\.\+!\*\'\(\),;\?&=-]|%[0-9a-f]{2})+'. // username
'(:([a-z0-9$_\.\+!\*\'\(\),;\?&=-]|%[0-9a-f]{2})+)?'. // password
'@)?(?#'. // auth requires @
')((([a-z0-9]\.|[a-z0-9][a-z0-9-]*[a-z0-9]\.)*'. // domain segments AND
'[a-z][a-z0-9-]*[a-z0-9]'. // top level domain OR
'|((\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5])\.){3}'.
'(\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5])'. // IP address
')(:\d+)?'. // port
')(((\/+([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)*'. // path
'(\?([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)'. // query string
'?)?)?'. // path and query string optional
'(#([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)?'. // fragment
'$/i');
/**
* Verify the syntax of the given URL.
*
* @access public
* @param $url The URL to verify.
* @return boolean
*/
function is_valid_url($url) {
if (str_starts_with(strtolower($url), 'http://localhost')) {
return true;
}
return preg_match(URL_FORMAT, $url);
}
/**
* String starts with something
*
* This function will return true only if input string starts with
* niddle
*
* @param string $string Input string
* @param string $niddle Needle string
* @return boolean
*/
function str_starts_with($string, $niddle) {
return substr($string, 0, strlen($niddle)) == $niddle;
}
/**
* Test a URL for validity and count results.
* @param url url
* @param expected expected result (true or false)
*/
$numtests = 0;
$passed = 0;
function test_url($url, $expected) {
global $numtests, $passed;
$numtests++;
$valid = is_valid_url($url);
echo "URL Valid?: " . ($valid?"yes":"no") . " for URL: $url. Expected: ".($expected?"yes":"no").". ";
if($valid == $expected) {
echo "PASS\n"; $passed++;
} else {
echo "FAIL\n";
}
}
echo "URL Tests:\n\n";
test_url("http://localserver/projects/public/assets/javascript/widgets/UserBoxMenu/widget.css", true);
test_url("http://www.google.com", true);
test_url("http://www.google.co.uk/projects/my%20folder/test.php", true);
test_url("https://myserver.localdomain", true);
test_url("http://192.168.1.120/projects/index.php", true);
test_url("http://192.168.1.1/projects/index.php", true);
test_url("http://projectpier-server.localdomain/projects/public/assets/javascript/widgets/UserBoxMenu/widget.css", true);
test_url("https://2.4.168.19/project-pier?c=test&a=b", true);
test_url("https://localhost/a/b/c/test.php?c=controller&arg1=20&arg2=20", true);
test_url("http://user:password@localhost/a/b/c/test.php?c=controller&arg1=20&arg2=20", true);
echo "\n$passed out of $numtests tests passed.\n\n";
?>
再次感谢regex的虚实!
我想我找到了一个更通用的regexp来验证url,特别是网站
(https?:\/\/)?(www\.)[-a-zA-Z0-9@:%._\+~#=]{2,256}\.[a-z]{2,4}\b([-a-zA-Z0-9@:%_\+.~#?&//=]*)|(https?:\/\/)?(www\.)?(?!ww)[-a-zA-Z0-9@:%._\+~#=]{2,256}\.[a-z]{2,4}\b([-a-zA-Z0-9@:%_\+.~#?&//=]*)
它不允许例如www.something或http://www或http://www.something
点击这里查看:http://regexr.com/3e4a2
这将匹配所有url
有或没有http/https 不管有没有WWW
...包括子域名和那些新的顶级域名扩展名如 .museum, .academy, .foundation 等等,最多可以有63个字符(不仅仅是。com, .net, .info等)
(([\w]+:)?//)?(([\d\w]|%[a-fA-f\d]{2,2})+(:([\d\w]|%[a-fA-f\d]{2,2})+)?@)?([\d\w][-\d\w]{0,253}[\d\w]\.)+[\w]{2,63}(:[\d]+)?(/([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)*(\?(&?([-+_~.\d\w]|%[a-fA-f\d]{2,2})=?)*)?(#([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)?
因为目前可用的顶级域名扩展名的最大长度是13个字符,例如。international,您可以将表达式中的数字63更改为13,以防止有人滥用它。
作为javascript
var urlreg=/(([\w]+:)?\/\/)?(([\d\w]|%[a-fA-f\d]{2,2})+(:([\d\w]|%[a-fA-f\d]{2,2})+)?@)?([\d\w][-\d\w]{0,253}[\d\w]\.)+[\w]{2,63}(:[\d]+)?(\/([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)*(\?(&?([-+_~.\d\w]|%[a-fA-f\d]{2,2})=?)*)?(#([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)?/; $('textarea').on('input',function(){ var url = $(this).val(); $(this).toggleClass('invalid', urlreg.test(url) == false) }); $('textarea').trigger('input'); textarea{color:green;} .invalid{color:red;} <script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script> <textarea>http://www.google.com</textarea> <textarea>http//www.google.com</textarea> <textarea>googlecom</textarea> <textarea>https://www.google.com</textarea>
维基百科文章:所有互联网顶级域名的列表
非验证uri引用解析器
为了便于参考,这里是IETF规范:(TXT | HTML)。特别地,附录b用正则表达式解析URI引用演示了如何解析有效的正则表达式。这被描述为,
这是一个非验证URI引用解析器的例子,它将接受任何给定的字符串并提取URI组件。
下面是它们提供的正则表达式:
^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
正如其他人所说,最好将此留给您已经在使用的库/框架。