我想找到最快的方法来检查一个文件是否存在于标准c++ 11, 14, 17,或C。我有成千上万的文件,在对它们做一些事情之前,我需要检查它们是否都存在。在下面的函数中,我可以写什么来代替/* SOMETHING */ ?
inline bool exist(const std::string& name)
{
/* SOMETHING */
}
我想找到最快的方法来检查一个文件是否存在于标准c++ 11, 14, 17,或C。我有成千上万的文件,在对它们做一些事情之前,我需要检查它们是否都存在。在下面的函数中,我可以写什么来代替/* SOMETHING */ ?
inline bool exist(const std::string& name)
{
/* SOMETHING */
}
当前回答
我使用这段代码,到目前为止,它工作得很好。它没有使用c++的许多奇特的特性:
bool is_file_exist(const char *fileName)
{
std::ifstream infile(fileName);
return infile.good();
}
其他回答
我编写了一个测试程序,每个方法都运行了10万次,一半在存在的文件上,一半在不存在的文件上。
#include <sys/stat.h>
#include <unistd.h>
#include <string>
#include <fstream>
inline bool exists_test0 (const std::string& name) {
ifstream f(name.c_str());
return f.good();
}
inline bool exists_test1 (const std::string& name) {
if (FILE *file = fopen(name.c_str(), "r")) {
fclose(file);
return true;
} else {
return false;
}
}
inline bool exists_test2 (const std::string& name) {
return ( access( name.c_str(), F_OK ) != -1 );
}
inline bool exists_test3 (const std::string& name) {
struct stat buffer;
return (stat (name.c_str(), &buffer) == 0);
}
在5次运行中平均运行100,000个调用的总时间结果,
Method | Time |
---|---|
exists_test0 (ifstream) |
0.485s |
exists_test1 (FILE fopen) |
0.302s |
exists_test2 (posix access()) |
0.202s |
exists_test3 (posix stat()) |
0.134s |
stat()函数在我的系统(Linux,用g++编译)上提供了最好的性能,如果您出于某种原因拒绝使用POSIX函数,那么标准的fopen调用是您最好的选择。
还有一个更简单的方法
#include <fstream>
#include <iostream>
void FileExists(std::string myfile){
std::ifstream file(myfile.c_str());
if (file) {
std::cout << "file exists" << std::endl;
}
else {
std::cout << "file doesn't exist" << std::endl;
}
}
int main() {
FileExists("myfile.txt");
return 0;
}
all_of (begin(R), end(R), [](auto&p){ exists(p); })
其中R是你的路径序列,exists()来自未来std或当前boost。如果你自己卷,简单点,
bool exists (string const& p) { return ifstream{p}; }
分支解决方案并不是绝对可怕的,它不会吞噬文件描述符,
bool exists (const char* p) {
#if defined(_WIN32) || defined(_WIN64)
return p && 0 != PathFileExists (p);
#else
struct stat sb;
return p && 0 == stat (p, &sb);
#endif
}
测试文件是否存在的最快和最安全的方法是根本不单独/显式地测试它。也就是说,看看你是否能找到一种方法来取代普通
if(exists(file)) { /* point A */
/* handle existence condition */
return;
}
do_something_with(file); /* point B */
随着
r = do_something_with_unless_exists(file);
if(r == 0)
success;
else if(errno == EEXIST)
/* handle existence condition */
else
/* handle other error */
除了速度更快之外,这还消除了第一个解决方案中固有的竞争条件(特别是“TOC/TOU”),即文件在点A和点B之间存在的可能性。
显然,第二个解决方案假定存在一种原子方法来执行do_something_with_unless_exists操作。通常总会有办法的,但有时你得四处寻找。
创建文件:使用O_CREAT和O_EXCL调用open()。 创建一个纯C文件,如果你有C11:调用fopen()与"wx"。(我昨天才知道这个。) 创建目录:只需调用mkdir(),然后检查errno == EEXIST。 获取锁:任何称职的锁定系统都已经拥有一个原子性的“只要没有其他人拥有就获取锁”原语。
(还有其他的,但这些是我现在能想到的。)
[脚注:在Unix的早期,没有特定的、专用的工具可用于普通进程进行锁定,所以如果你想建立一个互斥锁,这通常是通过创建一个特定的空目录来实现的,因为mkdir系统调用总是能够根据先前的存在或不存在而原子地失败或成功。]
It depends on where the files reside. For instance, if they are all supposed to be in the same directory, you can read all the directory entries into a hash table and then check all the names against the hash table. This might be faster on some systems than checking each file individually. The fastest way to check each file individually depends on your system ... if you're writing ANSI C, the fastest way is fopen because it's the only way (a file might exist but not be openable, but you probably really want openable if you need to "do something on it"). C++, POSIX, Windows all offer additional options.
While I'm at it, let me point out some problems with your question. You say that you want the fastest way, and that you have thousands of files, but then you ask for the code for a function to test a single file (and that function is only valid in C++, not C). This contradicts your requirements by making an assumption about the solution ... a case of the XY problem. You also say "in standard c++11(or)c++(or)c" ... which are all different, and this also is inconsistent with your requirement for speed ... the fastest solution would involve tailoring the code to the target system. The inconsistency in the question is highlighted by the fact that you accepted an answer that gives solutions that are system-dependent and are not standard C or C++.