我如何检查一个URL是否存在(不是404)在PHP?
当前回答
简单的方法是卷曲(和更快)
<?php
$mylinks="http://site.com/page.html";
$handlerr = curl_init($mylinks);
curl_setopt($handlerr, CURLOPT_RETURNTRANSFER, TRUE);
$resp = curl_exec($handlerr);
$ht = curl_getinfo($handlerr, CURLINFO_HTTP_CODE);
if ($ht == '404')
{ echo 'OK';}
else { echo 'NO';}
?>
其他回答
在某些服务器中不能使用curl 你可以用这个代码
<?php
$url = 'http://www.example.com';
$array = get_headers($url);
$string = $array[0];
if(strpos($string,"200"))
{
echo 'url exists';
}
else
{
echo 'url does not exist';
}
?>
$url = 'http://google.com';
$not_url = 'stp://google.com';
if (@file_get_contents($url)): echo "Found '$url'!";
else: echo "Can't find '$url'.";
endif;
if (@file_get_contents($not_url)): echo "Found '$not_url!";
else: echo "Can't find '$not_url'.";
endif;
// Found 'http://google.com'!Can't find 'stp://google.com'.
karim79的get_headers()解决方案并没有为我工作,因为我得到了疯狂的结果与Pinterest。
get_headers(): SSL operation failed with code 1. OpenSSL Error messages: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed
Array
(
[url] => https://www.pinterest.com/jonathan_parl/
[exists] =>
)
get_headers(): Failed to enable crypto
Array
(
[url] => https://www.pinterest.com/jonathan_parl/
[exists] =>
)
get_headers(https://www.pinterest.com/jonathan_parl/): failed to open stream: operation failed
Array
(
[url] => https://www.pinterest.com/jonathan_parl/
[exists] =>
)
不管怎样,这个开发人员演示了cURL比get_headers()快得多:
http://php.net/manual/fr/function.get-headers.php#104723
由于许多人要求karim79修复的是cURL解决方案,这里是我今天构建的解决方案。
/**
* Send an HTTP request to a the $url and check the header posted back.
*
* @param $url String url to which we must send the request.
* @param $failCodeList Int array list of code for which the page is considered invalid.
*
* @return Boolean
*/
public static function isUrlExists($url, array $failCodeList = array(404)){
$exists = false;
if(!StringManager::stringStartWith($url, "http") and !StringManager::stringStartWith($url, "ftp")){
$url = "https://" . $url;
}
if (preg_match(RegularExpression::URL, $url)){
$handle = curl_init($url);
curl_setopt($handle, CURLOPT_RETURNTRANSFER, true);
curl_setopt($handle, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($handle, CURLOPT_HEADER, true);
curl_setopt($handle, CURLOPT_NOBODY, true);
curl_setopt($handle, CURLOPT_USERAGENT, true);
$headers = curl_exec($handle);
curl_close($handle);
if (empty($failCodeList) or !is_array($failCodeList)){
$failCodeList = array(404);
}
if (!empty($headers)){
$exists = true;
$headers = explode(PHP_EOL, $headers);
foreach($failCodeList as $code){
if (is_numeric($code) and strpos($headers[0], strval($code)) !== false){
$exists = false;
break;
}
}
}
}
return $exists;
}
让我来解释一下旋度选项:
CURLOPT_RETURNTRANSFER:返回一个字符串,而不是在屏幕上显示调用页面。
CURLOPT_SSL_VERIFYPEER: cUrl不会签出证书
CURLOPT_HEADER:在字符串中包含头文件
CURLOPT_NOBODY:不要在字符串中包含body
CURLOPT_USERAGENT:一些站点需要它才能正常运行(例如:https://plus.google.com)
附加说明:在这个函数中,我使用Diego Perini的正则表达式在发送请求之前验证URL:
const URL = "%^(?:(?:https?|ftp)://)(?:\S+(?::\S*)?@|\d{1,3}(?:\.\d{1,3}){3}|(?:(?:[a-z\d\x{00a1}-\x{ffff}]+-?)*[a-z\d\x{00a1}-\x{ffff}]+)(?:\.(?:[a-z\d\x{00a1}-\x{ffff}]+-?)*[a-z\d\x{00a1}-\x{ffff}]+)*(?:\.[a-z\x{00a1}-\x{ffff}]{2,6}))(?::\d+)?(?:[^\s]*)?$%iu"; //@copyright Diego Perini
附加说明2:我将标题字符串和用户标题[0]分开,以确保只验证返回代码和消息(例如:200、404、405等)。
附加说明3:有时仅验证代码404是不够的(参见单元测试),因此有一个可选的$failCodeList参数提供所有要拒绝的代码列表。
当然,这里还有单元测试(包括所有流行的社交网络)来证明我的代码是合法的:
public function testIsUrlExists(){
//invalid
$this->assertFalse(ToolManager::isUrlExists("woot"));
$this->assertFalse(ToolManager::isUrlExists("https://www.facebook.com/jonathan.parentlevesque4545646456"));
$this->assertFalse(ToolManager::isUrlExists("https://plus.google.com/+JonathanParentL%C3%A9vesque890800"));
$this->assertFalse(ToolManager::isUrlExists("https://instagram.com/mariloubiz1232132/", array(404, 405)));
$this->assertFalse(ToolManager::isUrlExists("https://www.pinterest.com/jonathan_parl1231/"));
$this->assertFalse(ToolManager::isUrlExists("https://regex101.com/546465465456"));
$this->assertFalse(ToolManager::isUrlExists("https://twitter.com/arcadefire4566546"));
$this->assertFalse(ToolManager::isUrlExists("https://vimeo.com/**($%?%$", array(400, 405)));
$this->assertFalse(ToolManager::isUrlExists("https://www.youtube.com/user/Darkjo666456456456"));
//valid
$this->assertTrue(ToolManager::isUrlExists("www.google.ca"));
$this->assertTrue(ToolManager::isUrlExists("https://www.facebook.com/jonathan.parentlevesque"));
$this->assertTrue(ToolManager::isUrlExists("https://plus.google.com/+JonathanParentL%C3%A9vesque"));
$this->assertTrue(ToolManager::isUrlExists("https://instagram.com/mariloubiz/"));
$this->assertTrue(ToolManager::isUrlExists("https://www.facebook.com/jonathan.parentlevesque"));
$this->assertTrue(ToolManager::isUrlExists("https://www.pinterest.com/"));
$this->assertTrue(ToolManager::isUrlExists("https://regex101.com"));
$this->assertTrue(ToolManager::isUrlExists("https://twitter.com/arcadefire"));
$this->assertTrue(ToolManager::isUrlExists("https://vimeo.com/"));
$this->assertTrue(ToolManager::isUrlExists("https://www.youtube.com/user/Darkjo666"));
}
祝大家取得巨大成功,
Jonathan Parent-Lévesque在蒙特利尔报道
简单的方法是卷曲(和更快)
<?php
$mylinks="http://site.com/page.html";
$handlerr = curl_init($mylinks);
curl_setopt($handlerr, CURLOPT_RETURNTRANSFER, TRUE);
$resp = curl_exec($handlerr);
$ht = curl_getinfo($handlerr, CURLINFO_HTTP_CODE);
if ($ht == '404')
{ echo 'OK';}
else { echo 'NO';}
?>
这是一个解决方案,只读取源代码的第一个字节…如果file_get_contents失败,返回false…这也适用于远程文件,如图像。
function urlExists($url)
{
if (@file_get_contents($url,false,NULL,0,1))
{
return true;
}
return false;
}
推荐文章
- 当使用Composer的开发/生产开关时,如何正确部署?
- 自动删除Laravel (Eloquent ORM)中的相关行
- 在取消设置元素后重新设置数组键
- 如何修剪空白的数组值在php
- PHP中的双not(!!)操作符
- 如何从URL获取参数与JSP
- 在PHP5中创建单例设计模式
- URL从Java中的类路径加载资源
- 阻止人们入侵基于php的Flash游戏高分表的最佳方法是什么
- 单击url会打开默认浏览器
- PHP子字符串提取。获取第一个'/'之前的字符串或整个字符串
- __construct函数的作用是什么?
- PHP中的异步shell执行器
- Laravel 5 -如何访问在视图存储上传的图像?
- 我如何得到一个Flask请求的url的不同部分?