我如何检查一个URL是否存在(不是404)在PHP?


当前回答

检查URL是否有效的其他方法可以是:

<?php

  if (isValidURL("http://www.gimepix.com")) {
      echo "URL is valid...";
  } else {
      echo "URL is not valid...";
  }

  function isValidURL($url) {
      $file_headers = @get_headers($url);
      if (strpos($file_headers[0], "200 OK") > 0) {
         return true;
      } else {
        return false;
      }
  }
?>

其他回答

很快:

function http_response($url){
    $resURL = curl_init(); 
    curl_setopt($resURL, CURLOPT_URL, $url); 
    curl_setopt($resURL, CURLOPT_BINARYTRANSFER, 1); 
    curl_setopt($resURL, CURLOPT_HEADERFUNCTION, 'curlHeaderCallback'); 
    curl_setopt($resURL, CURLOPT_FAILONERROR, 1); 
    curl_exec ($resURL); 
    $intReturnCode = curl_getinfo($resURL, CURLINFO_HTTP_CODE); 
    curl_close ($resURL); 
    if ($intReturnCode != 200 && $intReturnCode != 302 && $intReturnCode != 304) { return 0; } else return 1;
}

echo 'google:';
echo http_response('http://www.google.com');
echo '/ ogogle:';
echo http_response('http://www.ogogle.com');
$headers = @get_headers($this->_value);
if(strpos($headers[0],'200')===false)return false;

所以任何时候你接触一个网站,得到200个以上的东西,它会工作

cURL可以返回HTTP代码,我不认为所有额外的代码是必要的?

function urlExists($url=NULL)
    {
        if($url == NULL) return false;
        $ch = curl_init($url);
        curl_setopt($ch, CURLOPT_TIMEOUT, 5);
        curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
        $data = curl_exec($ch);
        $httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
        curl_close($ch); 
        if($httpcode>=200 && $httpcode<300){
            return true;
        } else {
            return false;
        }
    }
function URLIsValid($URL)
{
    $exists = true;
    $file_headers = @get_headers($URL);
    $InvalidHeaders = array('404', '403', '500');
    foreach($InvalidHeaders as $HeaderVal)
    {
            if(strstr($file_headers[0], $HeaderVal))
            {
                    $exists = false;
                    break;
            }
    }
    return $exists;
}

karim79的get_headers()解决方案并没有为我工作,因为我得到了疯狂的结果与Pinterest。

get_headers(): SSL operation failed with code 1. OpenSSL Error messages: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed

Array
(
    [url] => https://www.pinterest.com/jonathan_parl/
    [exists] => 
)

get_headers(): Failed to enable crypto

Array
(
    [url] => https://www.pinterest.com/jonathan_parl/
    [exists] => 
)

get_headers(https://www.pinterest.com/jonathan_parl/): failed to open stream: operation failed

Array
(
    [url] => https://www.pinterest.com/jonathan_parl/
    [exists] => 
) 

不管怎样,这个开发人员演示了cURL比get_headers()快得多:

http://php.net/manual/fr/function.get-headers.php#104723

由于许多人要求karim79修复的是cURL解决方案,这里是我今天构建的解决方案。

/**
* Send an HTTP request to a the $url and check the header posted back.
*
* @param $url String url to which we must send the request.
* @param $failCodeList Int array list of code for which the page is considered invalid.
*
* @return Boolean
*/
public static function isUrlExists($url, array $failCodeList = array(404)){

    $exists = false;

    if(!StringManager::stringStartWith($url, "http") and !StringManager::stringStartWith($url, "ftp")){

        $url = "https://" . $url;
    }

    if (preg_match(RegularExpression::URL, $url)){

        $handle = curl_init($url);


        curl_setopt($handle, CURLOPT_RETURNTRANSFER, true);

        curl_setopt($handle, CURLOPT_SSL_VERIFYPEER, false);

        curl_setopt($handle, CURLOPT_HEADER, true);

        curl_setopt($handle, CURLOPT_NOBODY, true);

        curl_setopt($handle, CURLOPT_USERAGENT, true);


        $headers = curl_exec($handle);

        curl_close($handle);


        if (empty($failCodeList) or !is_array($failCodeList)){

            $failCodeList = array(404); 
        }

        if (!empty($headers)){

            $exists = true;

            $headers = explode(PHP_EOL, $headers);

            foreach($failCodeList as $code){

                if (is_numeric($code) and strpos($headers[0], strval($code)) !== false){

                    $exists = false;

                    break;  
                }
            }
        }
    }

    return $exists;
}

让我来解释一下旋度选项:

CURLOPT_RETURNTRANSFER:返回一个字符串,而不是在屏幕上显示调用页面。

CURLOPT_SSL_VERIFYPEER: cUrl不会签出证书

CURLOPT_HEADER:在字符串中包含头文件

CURLOPT_NOBODY:不要在字符串中包含body

CURLOPT_USERAGENT:一些站点需要它才能正常运行(例如:https://plus.google.com)


附加说明:在这个函数中,我使用Diego Perini的正则表达式在发送请求之前验证URL:

const URL = "%^(?:(?:https?|ftp)://)(?:\S+(?::\S*)?@|\d{1,3}(?:\.\d{1,3}){3}|(?:(?:[a-z\d\x{00a1}-\x{ffff}]+-?)*[a-z\d\x{00a1}-\x{ffff}]+)(?:\.(?:[a-z\d\x{00a1}-\x{ffff}]+-?)*[a-z\d\x{00a1}-\x{ffff}]+)*(?:\.[a-z\x{00a1}-\x{ffff}]{2,6}))(?::\d+)?(?:[^\s]*)?$%iu"; //@copyright Diego Perini

附加说明2:我将标题字符串和用户标题[0]分开,以确保只验证返回代码和消息(例如:200、404、405等)。

附加说明3:有时仅验证代码404是不够的(参见单元测试),因此有一个可选的$failCodeList参数提供所有要拒绝的代码列表。

当然,这里还有单元测试(包括所有流行的社交网络)来证明我的代码是合法的:

public function testIsUrlExists(){

//invalid
$this->assertFalse(ToolManager::isUrlExists("woot"));

$this->assertFalse(ToolManager::isUrlExists("https://www.facebook.com/jonathan.parentlevesque4545646456"));

$this->assertFalse(ToolManager::isUrlExists("https://plus.google.com/+JonathanParentL%C3%A9vesque890800"));

$this->assertFalse(ToolManager::isUrlExists("https://instagram.com/mariloubiz1232132/", array(404, 405)));

$this->assertFalse(ToolManager::isUrlExists("https://www.pinterest.com/jonathan_parl1231/"));

$this->assertFalse(ToolManager::isUrlExists("https://regex101.com/546465465456"));

$this->assertFalse(ToolManager::isUrlExists("https://twitter.com/arcadefire4566546"));

$this->assertFalse(ToolManager::isUrlExists("https://vimeo.com/**($%?%$", array(400, 405)));

$this->assertFalse(ToolManager::isUrlExists("https://www.youtube.com/user/Darkjo666456456456"));


//valid
$this->assertTrue(ToolManager::isUrlExists("www.google.ca"));

$this->assertTrue(ToolManager::isUrlExists("https://www.facebook.com/jonathan.parentlevesque"));

$this->assertTrue(ToolManager::isUrlExists("https://plus.google.com/+JonathanParentL%C3%A9vesque"));

$this->assertTrue(ToolManager::isUrlExists("https://instagram.com/mariloubiz/"));

$this->assertTrue(ToolManager::isUrlExists("https://www.facebook.com/jonathan.parentlevesque"));

$this->assertTrue(ToolManager::isUrlExists("https://www.pinterest.com/"));

$this->assertTrue(ToolManager::isUrlExists("https://regex101.com"));

$this->assertTrue(ToolManager::isUrlExists("https://twitter.com/arcadefire"));

$this->assertTrue(ToolManager::isUrlExists("https://vimeo.com/"));

$this->assertTrue(ToolManager::isUrlExists("https://www.youtube.com/user/Darkjo666"));
}

祝大家取得巨大成功,

Jonathan Parent-Lévesque在蒙特利尔报道