关于使用fs.readdir进行异步目录搜索有什么想法吗?我意识到我们可以引入递归,并调用read目录函数来读取下一个目录,但我有点担心它不是异步的…

什么好主意吗?我已经看了node-walk,它很棒,但它不能像readdir那样只给我数组中的文件。虽然

寻找这样的输出…

['file1.txt', 'file2.txt', 'dir/file3.txt']

当前回答

这就是我的答案。希望它能帮助到一些人。

我的重点是使搜索例程可以停在任何地方,对于找到的文件,告诉原始路径的相对深度。

var _fs = require('fs');
var _path = require('path');
var _defer = process.nextTick;

// next() will pop the first element from an array and return it, together with
// the recursive depth and the container array of the element. i.e. If the first
// element is an array, it'll be dug into recursively. But if the first element is
// an empty array, it'll be simply popped and ignored.
// e.g. If the original array is [1,[2],3], next() will return [1,0,[[2],3]], and
// the array becomes [[2],3]. If the array is [[[],[1,2],3],4], next() will return
// [1,2,[2]], and the array becomes [[[2],3],4].
// There is an infinity loop `while(true) {...}`, because I optimized the code to
// make it a non-recursive version.
var next = function(c) {
    var a = c;
    var n = 0;
    while (true) {
        if (a.length == 0) return null;
        var x = a[0];
        if (x.constructor == Array) {
            if (x.length > 0) {
                a = x;
                ++n;
            } else {
                a.shift();
                a = c;
                n = 0;
            }
        } else {
            a.shift();
            return [x, n, a];
        }
    }
}

// cb is the callback function, it have four arguments:
//    1) an error object if any exception happens;
//    2) a path name, may be a directory or a file;
//    3) a flag, `true` means directory, and `false` means file;
//    4) a zero-based number indicates the depth relative to the original path.
// cb should return a state value to tell whether the searching routine should
// continue: `true` means it should continue; `false` means it should stop here;
// but for a directory, there is a third state `null`, means it should do not
// dig into the directory and continue searching the next file.
var ls = function(path, cb) {
    // use `_path.resolve()` to correctly handle '.' and '..'.
    var c = [ _path.resolve(path) ];
    var f = function() {
        var p = next(c);
        p && s(p);
    };
    var s = function(p) {
        _fs.stat(p[0], function(err, ss) {
            if (err) {
                // use `_defer()` to turn a recursive call into a non-recursive call.
                cb(err, p[0], null, p[1]) && _defer(f);
            } else if (ss.isDirectory()) {
                var y = cb(null, p[0], true, p[1]);
                if (y) r(p);
                else if (y == null) _defer(f);
            } else {
                cb(null, p[0], false, p[1]) && _defer(f);
            }
        });
    };
    var r = function(p) {
        _fs.readdir(p[0], function(err, files) {
            if (err) {
                cb(err, p[0], true, p[1]) && _defer(f);
            } else {
                // not use `Array.prototype.map()` because we can make each change on site.
                for (var i = 0; i < files.length; i++) {
                    files[i] = _path.join(p[0], files[i]);
                }
                p[2].unshift(files);
                _defer(f);
            }
        });
    }
    _defer(f);
};

var printfile = function(err, file, isdir, n) {
    if (err) {
        console.log('-->   ' + ('[' + n + '] ') + file + ': ' + err);
        return true;
    } else {
        console.log('... ' + ('[' + n + '] ') + (isdir ? 'D' : 'F') + ' ' + file);
        return true;
    }
};

var path = process.argv[2];
ls(path, printfile);

其他回答

There are basically two ways of accomplishing this. In an async environment you'll notice that there are two kinds of loops: serial and parallel. A serial loop waits for one iteration to complete before it moves onto the next iteration - this guarantees that every iteration of the loop completes in order. In a parallel loop, all the iterations are started at the same time, and one may complete before another, however, it is much faster than a serial loop. So in this case, it's probably better to use a parallel loop because it doesn't matter what order the walk completes in, just as long as it completes and returns the results (unless you want them in order).

一个平行循环看起来是这样的:

var fs = require('fs');
var path = require('path');
var walk = function(dir, done) {
  var results = [];
  fs.readdir(dir, function(err, list) {
    if (err) return done(err);
    var pending = list.length;
    if (!pending) return done(null, results);
    list.forEach(function(file) {
      file = path.resolve(dir, file);
      fs.stat(file, function(err, stat) {
        if (stat && stat.isDirectory()) {
          walk(file, function(err, res) {
            results = results.concat(res);
            if (!--pending) done(null, results);
          });
        } else {
          results.push(file);
          if (!--pending) done(null, results);
        }
      });
    });
  });
};

一个串行循环看起来像这样:

var fs = require('fs');
var path = require('path');
var walk = function(dir, done) {
  var results = [];
  fs.readdir(dir, function(err, list) {
    if (err) return done(err);
    var i = 0;
    (function next() {
      var file = list[i++];
      if (!file) return done(null, results);
      file = path.resolve(dir, file);
      fs.stat(file, function(err, stat) {
        if (stat && stat.isDirectory()) {
          walk(file, function(err, res) {
            results = results.concat(res);
            next();
          });
        } else {
          results.push(file);
          next();
        }
      });
    })();
  });
};

并且在你的主目录中测试它(警告:如果你的主目录中有很多东西,结果列表将会非常大):

walk(process.env.HOME, function(err, results) {
  if (err) throw err;
  console.log(results);
});

编辑:改进的示例。

有一个名为cup-readdir的新模块,可以快速递归地搜索目录。它使用异步承诺,在处理深层目录结构时性能优于许多流行的模块。

它可以返回数组中的所有文件,并根据它们的属性对它们进行排序,但缺乏文件过滤和进入符号链接目录等功能。这对于只想从目录中获取每个文件的大型项目非常有用。这里是他们项目主页的链接。

如果你想使用npm包,扳手是很好的选择。

var wrench = require("wrench");

var files = wrench.readdirSyncRecursive("directory");

wrench.readdirRecursive("directory", function (error, files) {
    // live your dreams
});

编辑(2018): 作者在2015年弃用了这个包:

扳手.js已弃用,并且在相当长的一段时间内没有更新。我强烈建议使用fs-extra来执行任何额外的文件系统操作。

使用async/await,这应该工作:

const FS = require('fs');
const readDir = promisify(FS.readdir);
const fileStat = promisify(FS.stat);

async function getFiles(dir) {
    let files = await readDir(dir);

    let result = files.map(file => {
        let path = Path.join(dir,file);
        return fileStat(path).then(stat => stat.isDirectory() ? getFiles(path) : path);
    });

    return flatten(await Promise.all(result));
}

function flatten(arr) {
    return Array.prototype.concat(...arr);
}

你可以用蓝鸟。许诺或许诺:

/**
 * Returns a function that will wrap the given `nodeFunction`. Instead of taking a callback, the returned function will return a promise whose fate is decided by the callback behavior of the given node function. The node function should conform to node.js convention of accepting a callback as last argument and calling that callback with error as the first argument and success value on the second argument.
 *
 * @param {Function} nodeFunction
 * @returns {Function}
 */
module.exports = function promisify(nodeFunction) {
    return function(...args) {
        return new Promise((resolve, reject) => {
            nodeFunction.call(this, ...args, (err, data) => {
                if(err) {
                    reject(err);
                } else {
                    resolve(data);
                }
            })
        });
    };
};

Node 8+内置了Promisify

请参阅我对生成器方法的其他回答,它可以更快地得到结果。

简单,基于异步承诺


const fs = require('fs/promises');
const getDirRecursive = async (dir) => {
    try {
        const items = await fs.readdir(dir);
        let files = [];
        for (const item of items) {
            if ((await fs.lstat(`${dir}/${item}`)).isDirectory()) files = [...files, ...(await getDirRecursive(`${dir}/${item}`))];
            else files.push({file: item, path: `${dir}/${item}`, parents: dir.split("/")});
        }
        return files;
    } catch (e) {
        return e
    }
};

用法:await getDirRecursive("./public");