使用feof()来控制读循环有什么问题?例如:

#include <stdio.h>
#include <stdlib.h>

int
main(int argc, char **argv)
{
    char *path = "stdin";
    FILE *fp = argc > 1 ? fopen(path=argv[1], "r") : stdin;

    if( fp == NULL ){
        perror(path);
        return EXIT_FAILURE;
    }

    while( !feof(fp) ){  /* THIS IS WRONG */
        /* Read and process data from file… */
    }
    if( fclose(fp) != 0 ){
        perror(path);
        return EXIT_FAILURE;
    }
    return EXIT_SUCCESS;
}

这个循环出了什么问题?


当前回答

它是错误的,因为(在没有读取错误的情况下)它比作者预期的多进入一次循环。如果有读取错误,循环永远不会终止。

考虑下面的代码:

/* WARNING: demonstration of bad coding technique!! */

#include <stdio.h>
#include <stdlib.h>

FILE *Fopen(const char *path, const char *mode);

int main(int argc, char **argv)
{
    FILE *in;
    unsigned count;

    in = argc > 1 ? Fopen(argv[1], "r") : stdin;
    count = 0;

    /* WARNING: this is a bug */
    while( !feof(in) ) {  /* This is WRONG! */
        fgetc(in);
        count++;
    }
    printf("Number of characters read: %u\n", count);
    return EXIT_SUCCESS;
}

FILE * Fopen(const char *path, const char *mode)
{
    FILE *f = fopen(path, mode);
    if( f == NULL ) {
        perror(path);
        exit(EXIT_FAILURE);
    }
    return f;
}

此程序将始终打印比输入流中的字符数大1的字符(假设没有读取错误)。考虑输入流为空的情况:

$ ./a.out < /dev/null
Number of characters read: 1

在这种情况下,在读取任何数据之前调用feof(),因此它返回false。进入循环,调用fgetc()(并返回EOF),并增加count。然后调用feof()并返回true,导致循环终止。

This happens in all such cases. feof() does not return true until after a read on the stream encounters the end of file. The purpose of feof() is NOT to check if the next read will reach the end of file. The purpose of feof() is to determine the status of a previous read function and distinguish between an error condition and the end of the data stream. If fread() returns 0, you must use feof/ferror to decide whether an error occurred or if all of the data was consumed. Similarly if fgetc returns EOF. feof() is only useful after fread has returned zero or fgetc has returned EOF. Before that happens, feof() will always return 0.

在调用feof()之前,总是有必要检查read(fread()或fscanf()或fgetc())的返回值。

更糟糕的是,考虑发生读取错误的情况。在这种情况下,fgetc()返回EOF, feof()返回false,循环永远不会终止。在所有使用while(!feof(p))的情况下,必须至少在循环内部检查ferror(),或者至少应该将while条件替换为while(!feof(p) && !ferror(p)),否则非常有可能出现无限循环,在处理无效数据时可能会抛出各种垃圾。

So, in summary, although I cannot state with certainty that there is never a situation in which it may be semantically correct to write "while(!feof(f))" (although there must be another check inside the loop with a break to avoid a infinite loop on a read error), it is the case that it is almost certainly always wrong. And even if a case ever arose where it would be correct, it is so idiomatically wrong that it would not be the right way to write the code. Anyone seeing that code should immediately hesitate and say, "that's a bug". And possibly slap the author (unless the author is your boss in which case discretion is advised.)

其他回答

不,这并不总是错的。如果你的循环条件是"while we have not try read past file end ",那么你可以使用while (!feof(f))。然而,这不是一个常见的循环条件——通常你想测试其他的东西(比如“我能读更多吗”)。虽然(!feof(f))并没有错,只是用错了。

它是错误的,因为(在没有读取错误的情况下)它比作者预期的多进入一次循环。如果有读取错误,循环永远不会终止。

考虑下面的代码:

/* WARNING: demonstration of bad coding technique!! */

#include <stdio.h>
#include <stdlib.h>

FILE *Fopen(const char *path, const char *mode);

int main(int argc, char **argv)
{
    FILE *in;
    unsigned count;

    in = argc > 1 ? Fopen(argv[1], "r") : stdin;
    count = 0;

    /* WARNING: this is a bug */
    while( !feof(in) ) {  /* This is WRONG! */
        fgetc(in);
        count++;
    }
    printf("Number of characters read: %u\n", count);
    return EXIT_SUCCESS;
}

FILE * Fopen(const char *path, const char *mode)
{
    FILE *f = fopen(path, mode);
    if( f == NULL ) {
        perror(path);
        exit(EXIT_FAILURE);
    }
    return f;
}

此程序将始终打印比输入流中的字符数大1的字符(假设没有读取错误)。考虑输入流为空的情况:

$ ./a.out < /dev/null
Number of characters read: 1

在这种情况下,在读取任何数据之前调用feof(),因此它返回false。进入循环,调用fgetc()(并返回EOF),并增加count。然后调用feof()并返回true,导致循环终止。

This happens in all such cases. feof() does not return true until after a read on the stream encounters the end of file. The purpose of feof() is NOT to check if the next read will reach the end of file. The purpose of feof() is to determine the status of a previous read function and distinguish between an error condition and the end of the data stream. If fread() returns 0, you must use feof/ferror to decide whether an error occurred or if all of the data was consumed. Similarly if fgetc returns EOF. feof() is only useful after fread has returned zero or fgetc has returned EOF. Before that happens, feof() will always return 0.

在调用feof()之前,总是有必要检查read(fread()或fscanf()或fgetc())的返回值。

更糟糕的是,考虑发生读取错误的情况。在这种情况下,fgetc()返回EOF, feof()返回false,循环永远不会终止。在所有使用while(!feof(p))的情况下,必须至少在循环内部检查ferror(),或者至少应该将while条件替换为while(!feof(p) && !ferror(p)),否则非常有可能出现无限循环,在处理无效数据时可能会抛出各种垃圾。

So, in summary, although I cannot state with certainty that there is never a situation in which it may be semantically correct to write "while(!feof(f))" (although there must be another check inside the loop with a break to avoid a infinite loop on a read error), it is the case that it is almost certainly always wrong. And even if a case ever arose where it would be correct, it is so idiomatically wrong that it would not be the right way to write the code. Anyone seeing that code should immediately hesitate and say, "that's a bug". And possibly slap the author (unless the author is your boss in which case discretion is advised.)

feof() indicates if one has tried to read past the end of file. That means it has little predictive effect: if it is true, you are sure that the next input operation will fail (you aren't sure the previous one failed BTW), but if it is false, you aren't sure the next input operation will succeed. More over, input operations may fail for other reasons than the end of file (a format error for formatted input, a pure IO failure -- disk failure, network timeout -- for all input kinds), so even if you could be predictive about the end of file (and anybody who has tried to implement Ada one, which is predictive, will tell you it can complex if you need to skip spaces, and that it has undesirable effects on interactive devices -- sometimes forcing the input of the next line before starting the handling of the previous one), you would have to be able to handle a failure.

因此,C语言中正确的习惯是以IO操作成功作为循环条件进行循环,然后测试失败的原因。例如:

while (fgets(line, sizeof(line), file)) {
    /* note that fgets don't strip the terminating \n, checking its
       presence allow to handle lines longer that sizeof(line), not showed here */
    ...
}
if (ferror(file)) {
   /* IO failure */
} else if (feof(file)) {
   /* format error (not possible with fgets, but would be with fscanf) or end of file */
} else {
   /* format error (not possible with fgets, but would be with fscanf) */
}

Feof()不是很直观。在我看来,如果任何读取操作导致到达文件末尾,那么FILE的文件结束状态应该设置为true。相反,在每次读取操作之后,您必须手动检查是否已经到达文件的末尾。例如,如果使用fgetc()从文本文件读取,类似这样的东西将工作:

#include <stdio.h>

int main(int argc, char *argv[])
{
  FILE *in = fopen("testfile.txt", "r");

  while(1) {
    char c = fgetc(in);
    if (feof(in)) break;
    printf("%c", c);
  }

  fclose(in);
  return 0;
}

如果这样的东西能起作用就太好了:

#include <stdio.h>

int main(int argc, char *argv[])
{
  FILE *in = fopen("testfile.txt", "r");

  while(!feof(in)) {
    printf("%c", fgetc(in));
  }

  fclose(in);
  return 0;
}

博士TL;

While (!feof)是错误的,因为它测试了一些不相关的东西,而没有测试您需要知道的东西。结果是,您错误地执行了假定它正在访问已成功读取的数据的代码,而实际上这从未发生过。

我想提供一个抽象的、高层次的视角。所以,如果你对while(!feof)的实际功能感兴趣,请继续阅读。

并发性和同时性

I/O操作与环境交互。环境不是程序的一部分,也不在您的控制之下。环境真正地与您的程序“同时”存在。与所有并发的事情一样,关于“当前状态”的问题没有意义:并发事件之间没有“同时性”的概念。许多状态属性根本无法同时存在。

Let me make this more precise: Suppose you want to ask, "do you have more data". You could ask this of a concurrent container, or of your I/O system. But the answer is generally unactionable, and thus meaningless. So what if the container says "yes" – by the time you try reading, it may no longer have data. Similarly, if the answer is "no", by the time you try reading, data may have arrived. The conclusion is that there simply is no property like "I have data", since you cannot act meaningfully in response to any possible answer. (The situation is slightly better with buffered input, where you might conceivably get a "yes, I have data" that constitutes some kind of guarantee, but you would still have to be able to deal with the opposite case. And with output the situation is certainly just as bad as I described: you never know if that disk or that network buffer is full.)

So we conclude that it is impossible, and in fact unreasonable, to ask an I/O system whether it will be able to perform an I/O operation. The only possible way we can interact with it (just as with a concurrent container) is to attempt the operation and check whether it succeeded or failed. At that moment where you interact with the environment, then and only then can you know whether the interaction was actually possible, and at that point you must commit to performing the interaction. (This is a "synchronisation point", if you will.)

EOF

Now we get to EOF. EOF is the response you get from an attempted I/O operation. It means that you were trying to read or write something, but when doing so you failed to read or write any data, and instead the end of the input or output was encountered. This is true for essentially all the I/O APIs, whether it be the C standard library, C++ iostreams, or other libraries. As long as the I/O operations succeed, you simply cannot know whether further, future operations will succeed. You must always first try the operation and then respond to success or failure.

例子

在每个示例中,请仔细注意,我们首先尝试I/O操作,然后如果结果有效,则使用结果。进一步注意,我们总是必须使用I/O操作的结果,尽管结果在每个示例中具有不同的形状和形式。

C stdio, read from a file: for (;;) { size_t n = fread(buf, 1, bufsize, infile); consume(buf, n); if (n == 0) { break; } } The result we must use is n, the number of elements that were read (which may be as little as zero). C stdio, scanf: for (int a, b, c; scanf("%d %d %d", &a, &b, &c) == 3; ) { consume(a, b, c); } The result we must use is the return value of scanf, the number of elements converted. C++, iostreams formatted extraction: for (int n; std::cin >> n; ) { consume(n); } The result we must use is std::cin itself, which can be evaluated in a boolean context and tells us whether the stream is still in the good() state. C++, iostreams getline: for (std::string line; std::getline(std::cin, line); ) { consume(line); } The result we must use is again std::cin, just as before. POSIX, write(2) to flush a buffer: char const * p = buf; ssize_t n = bufsize; for (ssize_t k = bufsize; (k = write(fd, p, n)) > 0; p += k, n -= k) {} if (n != 0) { /* error, failed to write complete buffer */ } The result we use here is k, the number of bytes written. The point here is that we can only know how many bytes were written after the write operation. POSIX getline() char *buffer = NULL; size_t bufsiz = 0; ssize_t nbytes; while ((nbytes = getline(&buffer, &bufsiz, fp)) != -1) { /* Use nbytes of data in buffer */ } free(buffer); The result we must use is nbytes, the number of bytes up to and including the newline (or EOF if the file did not end with a newline). Note that the function explicitly returns -1 (and not EOF!) when an error occurs or it reaches EOF.

You may notice that we very rarely spell out the actual word "EOF". We usually detect the error condition in some other way that is more immediately interesting to us (e.g. failure to perform as much I/O as we had desired). In every example there is some API feature that could tell us explicitly that the EOF state has been encountered, but this is in fact not a terribly useful piece of information. It is much more of a detail than we often care about. What matters is whether the I/O succeeded, more-so than how it failed.

最后一个实际查询EOF状态的示例:假设您有一个字符串,并希望测试它是否代表一个完整的整数,除了空格之外,末尾没有额外的位。使用c++的iostreams,它是这样的: Std::string input = " 123 ";/ /实例 std:: istringstream iss(输入); int值; 如果(iss > >价值> > std:: ws & & iss.get () = = EOF) { 消费(价值); }其他{ // error, "input"不能作为整数解析 }

我们在这里使用了两个结果。第一个是iss,即流对象本身,用于检查对value的格式化提取是否成功。但是,在使用空格之后,我们执行另一个I/O/操作is .get(),并期望它作为EOF失败,如果整个字符串已经被格式化的提取使用,就会出现这种情况。

在C标准库中,通过检查结束指针是否到达输入字符串的末尾,可以使用strto*l函数实现类似的功能。