我如何检查一个URL是否存在(不是404)在PHP?
当前回答
function urlIsOk($url)
{
$headers = @get_headers($url);
$httpStatus = intval(substr($headers[0], 9, 3));
if ($httpStatus<400)
{
return true;
}
return false;
}
其他回答
我运行一些测试,看看我的网站上的链接是否有效-提醒我当第三方改变他们的链接。我有一个网站的问题,有一个配置不良的证书,这意味着php的get_headers不能工作。
所以,我读到卷曲更快,并决定给一个尝试。然后我在领英上遇到了一个问题,给了我一个999错误,后来证明是用户代理的问题。
我不关心证书是否对该测试无效,也不关心响应是否为重定向。
然后我认为使用get_headers无论如何,如果卷曲失败....
试试看....
/**
* returns true/false if the $url is valid.
*
* @param string $url assumes this is a valid url.
*
* @return bool
*/
private function urlExists(string $url): bool
{
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // do not output response in stdout
curl_setopt($ch, CURLOPT_NOBODY, true); // this does a head request to make it faster.
curl_setopt($ch, CURLOPT_HEADER, true); // just the headers
curl_setopt($ch, CURLOPT_SSL_VERIFYSTATUS, false); // turn off that pesky ssl stuff - some sys admins can't get it right.
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
// set a real user agent to stop linkedin getting upset.
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36');
curl_exec($ch);
$http_code = curl_getinfo($ch, CURLINFO_HTTP_CODE);
if (($http_code >= 200 && $http_code < 400) || $http_code === 999) {
curl_close($ch);
return true;
}
//$error = curl_error($ch); // used for debugging.
curl_close($ch);
// just try the get_headers - it might work!
stream_context_set_default(
['http' => ['method' => 'HEAD']]
);
$file_headers = @get_headers($url);
if ($file_headers !== false) {
$response_code = substr($file_headers[0], 9, 3);
return $response_code >= 200 && $response_code < 400;
}
return false;
}
有点老话题了,但是… 我是这样做的:
$file = 'http://www.google.com';
$file_headers = @get_headers($file);
if ($file_headers) {
$exists = true;
} else {
$exists = false;
}
get_headers()返回一个数组,其中包含服务器响应HTTP请求时发送的报头。
$image_path = 'https://your-domain.com/assets/img/image.jpg';
$file_headers = @get_headers($image_path);
//Prints the response out in an array
//print_r($file_headers);
if($file_headers[0] == 'HTTP/1.1 404 Not Found'){
echo 'Failed because path does not exist.</br>';
}else{
echo 'It works. Your good to go!</br>';
}
以上所有解决方案+额外的糖。(终极AIO解决方案)
/**
* Check that given URL is valid and exists.
* @param string $url URL to check
* @return bool TRUE when valid | FALSE anyway
*/
function urlExists ( $url ) {
// Remove all illegal characters from a url
$url = filter_var($url, FILTER_SANITIZE_URL);
// Validate URI
if (filter_var($url, FILTER_VALIDATE_URL) === FALSE
// check only for http/https schemes.
|| !in_array(strtolower(parse_url($url, PHP_URL_SCHEME)), ['http','https'], true )
) {
return false;
}
// Check that URL exists
$file_headers = @get_headers($url);
return !(!$file_headers || $file_headers[0] === 'HTTP/1.1 404 Not Found');
}
例子:
var_dump ( urlExists('http://stackoverflow.com/') );
// Output: true;
当从php中判断url是否存在时,有几件事需要注意:
Is the url itself valid (a string, not empty, good syntax), this is quick to check server side. Waiting for a response might take time and block code execution. Not all headers returned by get_headers() are well formed. Use curl (if you can). Prevent fetching the entire body/content, but only request the headers. Consider redirecting urls: Do you want the first code returned? Or follow all redirects and return the last code? You might end up with a 200, but it could redirect using meta tags or javascript. Figuring out what happens after is tough.
请记住,无论你使用什么方法,等待回复都需要时间。 所有代码都可能(很可能)停止,直到您知道结果或请求超时。
例如:如果url无效或不可达,下面的代码可能需要很长时间才能显示页面:
<?php
$urls = getUrls(); // some function getting say 10 or more external links
foreach($urls as $k=>$url){
// this could potentially take 0-30 seconds each
// (more or less depending on connection, target site, timeout settings...)
if( ! isValidUrl($url) ){
unset($urls[$k]);
}
}
echo "yay all done! now show my site";
foreach($urls as $url){
echo "<a href=\"{$url}\">{$url}</a><br/>";
}
下面的函数可能会有帮助,你可能想修改它们以适应你的需要:
function isValidUrl($url){
// first do some quick sanity checks:
if(!$url || !is_string($url)){
return false;
}
// quick check url is roughly a valid http request: ( http://blah/... )
if( ! preg_match('/^http(s)?:\/\/[a-z0-9-]+(\.[a-z0-9-]+)*(:[0-9]+)?(\/.*)?$/i', $url) ){
return false;
}
// the next bit could be slow:
if(getHttpResponseCode_using_curl($url) != 200){
// if(getHttpResponseCode_using_getheaders($url) != 200){ // use this one if you cant use curl
return false;
}
// all good!
return true;
}
function getHttpResponseCode_using_curl($url, $followredirects = true){
// returns int responsecode, or false (if url does not exist or connection timeout occurs)
// NOTE: could potentially take up to 0-30 seconds , blocking further code execution (more or less depending on connection, target site, and local timeout settings))
// if $followredirects == false: return the FIRST known httpcode (ignore redirects)
// if $followredirects == true : return the LAST known httpcode (when redirected)
if(! $url || ! is_string($url)){
return false;
}
$ch = @curl_init($url);
if($ch === false){
return false;
}
@curl_setopt($ch, CURLOPT_HEADER ,true); // we want headers
@curl_setopt($ch, CURLOPT_NOBODY ,true); // dont need body
@curl_setopt($ch, CURLOPT_RETURNTRANSFER ,true); // catch output (do NOT print!)
if($followredirects){
@curl_setopt($ch, CURLOPT_FOLLOWLOCATION ,true);
@curl_setopt($ch, CURLOPT_MAXREDIRS ,10); // fairly random number, but could prevent unwanted endless redirects with followlocation=true
}else{
@curl_setopt($ch, CURLOPT_FOLLOWLOCATION ,false);
}
// @curl_setopt($ch, CURLOPT_CONNECTTIMEOUT ,5); // fairly random number (seconds)... but could prevent waiting forever to get a result
// @curl_setopt($ch, CURLOPT_TIMEOUT ,6); // fairly random number (seconds)... but could prevent waiting forever to get a result
// @curl_setopt($ch, CURLOPT_USERAGENT ,"Mozilla/5.0 (Windows NT 6.0) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1"); // pretend we're a regular browser
@curl_exec($ch);
if(@curl_errno($ch)){ // should be 0
@curl_close($ch);
return false;
}
$code = @curl_getinfo($ch, CURLINFO_HTTP_CODE); // note: php.net documentation shows this returns a string, but really it returns an int
@curl_close($ch);
return $code;
}
function getHttpResponseCode_using_getheaders($url, $followredirects = true){
// returns string responsecode, or false if no responsecode found in headers (or url does not exist)
// NOTE: could potentially take up to 0-30 seconds , blocking further code execution (more or less depending on connection, target site, and local timeout settings))
// if $followredirects == false: return the FIRST known httpcode (ignore redirects)
// if $followredirects == true : return the LAST known httpcode (when redirected)
if(! $url || ! is_string($url)){
return false;
}
$headers = @get_headers($url);
if($headers && is_array($headers)){
if($followredirects){
// we want the last errorcode, reverse array so we start at the end:
$headers = array_reverse($headers);
}
foreach($headers as $hline){
// search for things like "HTTP/1.1 200 OK" , "HTTP/1.0 200 OK" , "HTTP/1.1 301 PERMANENTLY MOVED" , "HTTP/1.1 400 Not Found" , etc.
// note that the exact syntax/version/output differs, so there is some string magic involved here
if(preg_match('/^HTTP\/\S+\s+([1-9][0-9][0-9])\s+.*/', $hline, $matches) ){// "HTTP/*** ### ***"
$code = $matches[1];
return $code;
}
}
// no HTTP/xxx found in headers:
return false;
}
// no headers :
return false;
}
推荐文章
- 原则-如何打印出真正的sql,而不仅仅是准备好的语句?
- 如何从关联PHP数组中获得第一项?
- PHP/MySQL插入一行然后获取id
- 我如何排序一个多维数组在PHP
- 如何在PHP中截断字符串最接近于一定数量的字符?
- PHP错误:“zip扩展名和unzip命令都没有,跳过。”
- Nginx提供下载。php文件,而不是执行它们
- Json_encode()转义正斜杠
- 如何在PHP中捕获cURL错误
- 如何要求一个分叉与作曲家?
- 如何在php中创建可选参数?
- 在文本文件中创建或写入/追加
- 为什么PHP的json_encode函数转换UTF-8字符串为十六进制实体?
- 如何从一个查询插入多行使用雄辩/流利
- URL中的“#:~:text=”位置哈希值到底是什么?