How To Stream Read Directory In Node.js?
Solution 1:
In modern computers traversing a directory with 500K files is nothing. When you fs.readdir
asynchronously in Node.js, what it does is just read a list of file names in the specified directory. It doesn't read the files' contents. I've just tested with 700K files in the dir. It takes only 21MB of memory to load this list of file names.
Once you've loaded this list of file names, you just traverse them one by one or in parallel by setting some limit for concurrency and you can easily consume them all. Example:
var async = require('async'),
fs = require('fs'),
path = require('path'),
parentDir = '/home/user';
async.waterfall([
function (cb) {
fs.readdir(parentDir, cb);
},
function (files, cb) {
// `files` is just an array of file names, not full path.
// Consume 10 files in parallel.
async.eachLimit(files, 10, function (filename, done) {
var filePath = path.join(parentDir, filename);
// Do with this files whatever you want.
// Then don't forget to call `done()`.
done();
}, cb);
}
], function (err) {
err && console.trace(err);
console.log('Done');
});
Solution 2:
Now there is a way to do it with async iteration! You can do:
const dir = fs.opendirSync('/tmp')
for await (let file of dir) {
console.log(file.name)
}
To turn it into a stream:
const _pipeline = util.promisify(pipeline)
await _pipeline([
Readable.from(dir),
... // consume!
])
Solution 3:
As of version 10, there is still no good solution for this. Node is just not that mature yet.
modern filesystems can easily handle millions of files in a directory. And of cause you can make a god cases for it, in a large scale operations, as you suggests.
The underlying C library iterates over the directory list, one at a time, as it should. But all node implementations I have seen, that claims to iterate, uses fs.readdir
, that reads all into memory, as fast as it can.
As I understand it, you have to wait for a new version of libuv to be adopted into node. And then for the maintainers to address this old issue. See discussion at https://github.com/nodejs/node/issues/583
Some improvements will happen in with version 12.
Post a Comment for "How To Stream Read Directory In Node.js?"