我想逐行读取文本文件。我想知道我是否在。net c#范围内尽可能高效地完成它。


var filestream = new System.IO.FileStream(textFilePath,
var file = new System.IO.StreamReader(filestream, System.Text.Encoding.UTF8, true, 128);

while ((lineOfText = file.ReadLine()) != null)
    //Do something with the lineOfText


在Stack Overflow的问题中有一个关于这个问题的好话题,“收益返回”比“老派”返回慢吗?


ReadAllLines loads all of the lines into memory and returns a string[]. All well and good if the file is small. If the file is larger than will fit in memory, you'll run out of memory. ReadLines, on the other hand, uses yield return to return one line at a time. With it, you can read any size file. It doesn't load the whole file into memory. Say you wanted to find the first line that contains the word "foo", and then exit. Using ReadAllLines, you'd have to read the entire file into memory, even if "foo" occurs on the first line. With ReadLines, you only read one line. Which one would be faster?


在Stack Overflow的问题中有一个关于这个问题的好话题,“收益返回”比“老派”返回慢吗?


ReadAllLines loads all of the lines into memory and returns a string[]. All well and good if the file is small. If the file is larger than will fit in memory, you'll run out of memory. ReadLines, on the other hand, uses yield return to return one line at a time. With it, you can read any size file. It doesn't load the whole file into memory. Say you wanted to find the first line that contains the word "foo", and then exit. Using ReadAllLines, you'd have to read the entire file into memory, even if "foo" occurs on the first line. With ReadLines, you only read one line. Which one would be faster?


虽然file . readalllines()是读取文件的最简单方法之一,但它也是最慢的方法之一。


using (StreamReader sr = File.OpenText(fileName))
        string s = String.Empty;
        while ((s = sr.ReadLine()) != null)
               //do minimal amount of work here


AllLines = new string[MAX]; //only allocate memory here

using (StreamReader sr = File.OpenText(fileName))
        int x = 0;
        while (!sr.EndOfStream)
               AllLines[x] = sr.ReadLine();
               x += 1;
} //Finished. Close the file

//Now parallel process each line in the file
Parallel.For(0, AllLines.Length, x =>
    DoYourStuff(AllLines[x]); //do your work here

如果你正在使用。net 4,只需使用File即可。ReadLines为你做了所有这些。我怀疑它和你的差不多,除了它也可以使用FileOptions。SequentialScan和一个更大的缓冲区(128看起来很小)。



    //can return empty lines sometimes
    class LinePortionTextReader
        private const int BUFFER_SIZE = 100000000; //100M characters
        StreamReader sr = null;
        string remainder = "";

        public LinePortionTextReader(string filePath)
            if (File.Exists(filePath))
                sr = new StreamReader(filePath);
                remainder = "";

            if(null != sr) { sr.Close(); }

        public string[] ReadBlock()
            if(null==sr) { return new string[] { }; }
            char[] buffer = new char[BUFFER_SIZE];
            int charactersRead = sr.Read(buffer, 0, BUFFER_SIZE);
            if (charactersRead < 1) { return new string[] { }; }
            bool lastPart = (charactersRead < BUFFER_SIZE);
            if (lastPart)
                char[] buffer2 = buffer.Take<char>(charactersRead).ToArray();
                buffer = buffer2;
            string s = new string(buffer);
            string[] sresult = s.Split(new string[] { "\r\n" }, StringSplitOptions.None);
            sresult[0] = remainder + sresult[0];
            if (!lastPart)
                remainder = sresult[sresult.Length - 1];
                sresult[sresult.Length - 1] = "";
            return sresult;

        public bool EOS
                return (null == sr) ? true: sr.EndOfStream;


    class Program
        static void Main(string[] args)
            if (args.Length < 3)
                Console.WriteLine("multifind.exe <where to search> <what to look for, one value per line> <where to put the result>");

            if (!File.Exists(args[0]))
                Console.WriteLine("source file not found");
            if (!File.Exists(args[1]))
                Console.WriteLine("reference file not found");

            TextWriter tw = new StreamWriter(args[2], false);

            string[] refLines = File.ReadAllLines(args[1]);

            LinePortionTextReader lptr = new LinePortionTextReader(args[0]);
            int blockCounter = 0;
            while (!lptr.EOS)
                string[] srcLines = lptr.ReadBlock();
                for (int i = 0; i < srcLines.Length; i += 1)
                    string theLine = srcLines[i];
                    if (!string.IsNullOrEmpty(theLine)) //can return empty lines sometimes
                        for (int j = 0; j < refLines.Length; j += 1)
                            if (theLine.Contains(refLines[j]))

                blockCounter += 1;
                Console.WriteLine(String.Format("100 Mb blocks processed: {0}", blockCounter));

我相信拆分字符串和数组处理可以显著改善, 然而,这里的目标是最小化磁盘读取的数量。