我如何检查一个URL是否存在(不是404)在PHP?
当前回答
当从php中判断url是否存在时,有几件事需要注意:
Is the url itself valid (a string, not empty, good syntax), this is quick to check server side. Waiting for a response might take time and block code execution. Not all headers returned by get_headers() are well formed. Use curl (if you can). Prevent fetching the entire body/content, but only request the headers. Consider redirecting urls: Do you want the first code returned? Or follow all redirects and return the last code? You might end up with a 200, but it could redirect using meta tags or javascript. Figuring out what happens after is tough.
请记住,无论你使用什么方法,等待回复都需要时间。 所有代码都可能(很可能)停止,直到您知道结果或请求超时。
例如:如果url无效或不可达,下面的代码可能需要很长时间才能显示页面:
<?php
$urls = getUrls(); // some function getting say 10 or more external links
foreach($urls as $k=>$url){
// this could potentially take 0-30 seconds each
// (more or less depending on connection, target site, timeout settings...)
if( ! isValidUrl($url) ){
unset($urls[$k]);
}
}
echo "yay all done! now show my site";
foreach($urls as $url){
echo "<a href=\"{$url}\">{$url}</a><br/>";
}
下面的函数可能会有帮助,你可能想修改它们以适应你的需要:
function isValidUrl($url){
// first do some quick sanity checks:
if(!$url || !is_string($url)){
return false;
}
// quick check url is roughly a valid http request: ( http://blah/... )
if( ! preg_match('/^http(s)?:\/\/[a-z0-9-]+(\.[a-z0-9-]+)*(:[0-9]+)?(\/.*)?$/i', $url) ){
return false;
}
// the next bit could be slow:
if(getHttpResponseCode_using_curl($url) != 200){
// if(getHttpResponseCode_using_getheaders($url) != 200){ // use this one if you cant use curl
return false;
}
// all good!
return true;
}
function getHttpResponseCode_using_curl($url, $followredirects = true){
// returns int responsecode, or false (if url does not exist or connection timeout occurs)
// NOTE: could potentially take up to 0-30 seconds , blocking further code execution (more or less depending on connection, target site, and local timeout settings))
// if $followredirects == false: return the FIRST known httpcode (ignore redirects)
// if $followredirects == true : return the LAST known httpcode (when redirected)
if(! $url || ! is_string($url)){
return false;
}
$ch = @curl_init($url);
if($ch === false){
return false;
}
@curl_setopt($ch, CURLOPT_HEADER ,true); // we want headers
@curl_setopt($ch, CURLOPT_NOBODY ,true); // dont need body
@curl_setopt($ch, CURLOPT_RETURNTRANSFER ,true); // catch output (do NOT print!)
if($followredirects){
@curl_setopt($ch, CURLOPT_FOLLOWLOCATION ,true);
@curl_setopt($ch, CURLOPT_MAXREDIRS ,10); // fairly random number, but could prevent unwanted endless redirects with followlocation=true
}else{
@curl_setopt($ch, CURLOPT_FOLLOWLOCATION ,false);
}
// @curl_setopt($ch, CURLOPT_CONNECTTIMEOUT ,5); // fairly random number (seconds)... but could prevent waiting forever to get a result
// @curl_setopt($ch, CURLOPT_TIMEOUT ,6); // fairly random number (seconds)... but could prevent waiting forever to get a result
// @curl_setopt($ch, CURLOPT_USERAGENT ,"Mozilla/5.0 (Windows NT 6.0) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1"); // pretend we're a regular browser
@curl_exec($ch);
if(@curl_errno($ch)){ // should be 0
@curl_close($ch);
return false;
}
$code = @curl_getinfo($ch, CURLINFO_HTTP_CODE); // note: php.net documentation shows this returns a string, but really it returns an int
@curl_close($ch);
return $code;
}
function getHttpResponseCode_using_getheaders($url, $followredirects = true){
// returns string responsecode, or false if no responsecode found in headers (or url does not exist)
// NOTE: could potentially take up to 0-30 seconds , blocking further code execution (more or less depending on connection, target site, and local timeout settings))
// if $followredirects == false: return the FIRST known httpcode (ignore redirects)
// if $followredirects == true : return the LAST known httpcode (when redirected)
if(! $url || ! is_string($url)){
return false;
}
$headers = @get_headers($url);
if($headers && is_array($headers)){
if($followredirects){
// we want the last errorcode, reverse array so we start at the end:
$headers = array_reverse($headers);
}
foreach($headers as $hline){
// search for things like "HTTP/1.1 200 OK" , "HTTP/1.0 200 OK" , "HTTP/1.1 301 PERMANENTLY MOVED" , "HTTP/1.1 400 Not Found" , etc.
// note that the exact syntax/version/output differs, so there is some string magic involved here
if(preg_match('/^HTTP\/\S+\s+([1-9][0-9][0-9])\s+.*/', $hline, $matches) ){// "HTTP/*** ### ***"
$code = $matches[1];
return $code;
}
}
// no HTTP/xxx found in headers:
return false;
}
// no headers :
return false;
}
其他回答
cURL可以返回HTTP代码,我不认为所有额外的代码是必要的?
function urlExists($url=NULL)
{
if($url == NULL) return false;
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$data = curl_exec($ch);
$httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
if($httpcode>=200 && $httpcode<300){
return true;
} else {
return false;
}
}
在某些服务器中不能使用curl 你可以用这个代码
<?php
$url = 'http://www.example.com';
$array = get_headers($url);
$string = $array[0];
if(strpos($string,"200"))
{
echo 'url exists';
}
else
{
echo 'url does not exist';
}
?>
以上所有解决方案+额外的糖。(终极AIO解决方案)
/**
* Check that given URL is valid and exists.
* @param string $url URL to check
* @return bool TRUE when valid | FALSE anyway
*/
function urlExists ( $url ) {
// Remove all illegal characters from a url
$url = filter_var($url, FILTER_SANITIZE_URL);
// Validate URI
if (filter_var($url, FILTER_VALIDATE_URL) === FALSE
// check only for http/https schemes.
|| !in_array(strtolower(parse_url($url, PHP_URL_SCHEME)), ['http','https'], true )
) {
return false;
}
// Check that URL exists
$file_headers = @get_headers($url);
return !(!$file_headers || $file_headers[0] === 'HTTP/1.1 404 Not Found');
}
例子:
var_dump ( urlExists('http://stackoverflow.com/') );
// Output: true;
在这里:
$file = 'http://www.example.com/somefile.jpg';
$file_headers = @get_headers($file);
if(!$file_headers || $file_headers[0] == 'HTTP/1.1 404 Not Found') {
$exists = false;
}
else {
$exists = true;
}
从这里和上面帖子的正下方,有一个卷曲的解决方案:
function url_exists($url) {
return curl_init($url) !== false;
}
我运行一些测试,看看我的网站上的链接是否有效-提醒我当第三方改变他们的链接。我有一个网站的问题,有一个配置不良的证书,这意味着php的get_headers不能工作。
所以,我读到卷曲更快,并决定给一个尝试。然后我在领英上遇到了一个问题,给了我一个999错误,后来证明是用户代理的问题。
我不关心证书是否对该测试无效,也不关心响应是否为重定向。
然后我认为使用get_headers无论如何,如果卷曲失败....
试试看....
/**
* returns true/false if the $url is valid.
*
* @param string $url assumes this is a valid url.
*
* @return bool
*/
private function urlExists(string $url): bool
{
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // do not output response in stdout
curl_setopt($ch, CURLOPT_NOBODY, true); // this does a head request to make it faster.
curl_setopt($ch, CURLOPT_HEADER, true); // just the headers
curl_setopt($ch, CURLOPT_SSL_VERIFYSTATUS, false); // turn off that pesky ssl stuff - some sys admins can't get it right.
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
// set a real user agent to stop linkedin getting upset.
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36');
curl_exec($ch);
$http_code = curl_getinfo($ch, CURLINFO_HTTP_CODE);
if (($http_code >= 200 && $http_code < 400) || $http_code === 999) {
curl_close($ch);
return true;
}
//$error = curl_error($ch); // used for debugging.
curl_close($ch);
// just try the get_headers - it might work!
stream_context_set_default(
['http' => ['method' => 'HEAD']]
);
$file_headers = @get_headers($url);
if ($file_headers !== false) {
$response_code = substr($file_headers[0], 9, 3);
return $response_code >= 200 && $response_code < 400;
}
return false;
}
推荐文章
- 格式化字节到千字节,兆字节,千兆字节
- 如何在PHP中获得变量名作为字符串?
- 用“+”(数组联合运算符)合并两个数组如何工作?
- 创建url的安全字符是什么?
- Laravel PHP命令未找到
- 如何修复从源代码安装PHP时未发现xml2-config的错误?
- 在PHP中对动态变量名使用大括号
- 如何从对象数组中通过对象属性找到条目?
- 如何从关联数组中删除键及其值?
- PHP字符串中的花括号
- PHP -如何最好地确定当前调用是否来自CLI或web服务器?
- 无法打开流:没有这样的文件或目录
- 在php中生成一个随机密码
- 如何通过PHP检查URL是否存在?
- 如何防止页面刷新时重新提交表单(F5 / CTRL+R)