# 2021 Update Consider using [`@root/walk`](https://npmjs.org/package/@root/walk) instead. I created `walk` quite literally a decade ago, in the Node v0.x days. Back then using an EventEmitter seemed like the thing to do. Nowadays, it seems a bit overkill for the simple task of walking over directories. There's nothing wrong with `walk` - it's about the same as it was 10 years ago - however, at only 50 lines of code long, `@root/walk` is much simpler and much faster. # node-walk | a [Root](https://rootprojects.org) project nodejs walk implementation. This is somewhat of a port python's `os.walk`, but using Node.JS conventions. - EventEmitter - Asynchronous - Chronological (optionally) - Built-in flow-control - includes Synchronous version (same API as Asynchronous) As few file descriptors are opened at a time as possible. This is particularly well suited for single hard disks which are not flash or solid state. ## Installation ```bash npm install --save walk ``` # Getting Started ```javascript 'use strict'; var walk = require('walk'); var fs = require('fs'); var walker; var options = {}; walker = walk.walk('/tmp', options); walker.on('file', function (root, fileStats, next) { fs.readFile(fileStats.name, function () { // doStuff next(); }); }); walker.on('errors', function (root, nodeStatsArray, next) { next(); }); walker.on('end', function () { console.log('all done'); }); ``` ## Common Events All single event callbacks are in the form of `function (root, stat, next) {}`. All multiple event callbacks callbacks are in the form of `function (root, stats, next) {}`, except **names** which is an array of strings. All **error** event callbacks are in the form `function (root, stat/stats, next) {}`. **`stat.error`** contains the error. - `names` - `directory` - `directories` - `file` - `files` - `end` - `nodeError` (`stat` failed) - `directoryError` (`stat` succedded, but `readdir` failed) - `errors` (a collection of any errors encountered) A typical `stat` event looks like this: ```javascript { dev: 16777223, mode: 33188, nlink: 1, uid: 501, gid: 20, rdev: 0, blksize: 4096, ino: 49868100, size: 5617, blocks: 16, atime: Mon Jan 05 2015 18:18:10 GMT-0700 (MST), mtime: Thu Sep 25 2014 21:21:28 GMT-0600 (MDT), ctime: Thu Sep 25 2014 21:21:28 GMT-0600 (MDT), birthtime: Thu Sep 25 2014 21:21:28 GMT-0600 (MDT), name: 'README.md', type: 'file' } ``` # Advanced Example Both Asynchronous and Synchronous versions are provided. ```javascript 'use strict'; var walk = require('walk'); var fs = require('fs'); var options; var walker; options = { followLinks: false, // directories with these keys will be skipped filters: ['Temp', '_Temp'], }; walker = walk.walk('/tmp', options); // OR // walker = walk.walkSync("/tmp", options); walker.on('names', function (root, nodeNamesArray) { nodeNamesArray.sort(function (a, b) { if (a > b) return 1; if (a < b) return -1; return 0; }); }); walker.on('directories', function (root, dirStatsArray, next) { // dirStatsArray is an array of `stat` objects with the additional attributes // * type // * error // * name next(); }); walker.on('file', function (root, fileStats, next) { fs.readFile(fileStats.name, function () { // doStuff next(); }); }); walker.on('errors', function (root, nodeStatsArray, next) { next(); }); walker.on('end', function () { console.log('all done'); }); ``` ### Sync Note: You **can't use EventEmitter** if you want truly synchronous walker (although it's synchronous under the hood, it appears not to be due to the use of `process.nextTick()`). Instead **you must use `options.listeners`** for truly synchronous walker. Although the sync version uses all of the `fs.readSync`, `fs.readdirSync`, and other sync methods, I don't think I can prevent the `process.nextTick()` that `EventEmitter` calls. ```javascript (function () { 'use strict'; var walk = require('walk'); var fs = require('fs'); var options; var walker; // To be truly synchronous in the emitter and maintain a compatible api, // the listeners must be listed before the object is created options = { listeners: { names: function (root, nodeNamesArray) { nodeNamesArray.sort(function (a, b) { if (a > b) return 1; if (a < b) return -1; return 0; }); }, directories: function (root, dirStatsArray, next) { // dirStatsArray is an array of `stat` objects with the additional attributes // * type // * error // * name next(); }, file: function (root, fileStats, next) { fs.readFile(fileStats.name, function () { // doStuff next(); }); }, errors: function (root, nodeStatsArray, next) { next(); }, }, }; walker = walk.walkSync('/tmp', options); console.log('all done'); })(); ``` # API Emitted Values - `on('XYZ', function(root, stats, next) {})` - `root` - the containing the files to be inspected - _stats[Array]_ - a single `stats` object or an array with some added attributes - type - 'file', 'directory', etc - error - name - the name of the file, dir, etc - next - no more files will be read until this is called Single Events - fired immediately - `end` - No files, dirs, etc left to inspect - `directoryError` - Error when `fstat` succeeded, but reading path failed (Probably due to permissions). - `nodeError` - Error `fstat` did not succeeded. - `node` - a `stats` object for a node of any type - `file` - includes links when `followLinks` is `true` - `directory` - **NOTE** you could get a recursive loop if `followLinks` and a directory links to its parent - `symbolicLink` - always empty when `followLinks` is `true` - `blockDevice` - `characterDevice` - `FIFO` - `socket` Events with Array Arguments - fired after all files in the dir have been `stat`ed - `names` - before any `stat` takes place. Useful for sorting and filtering. - Note: the array is an array of `string`s, not `stat` objects - Note: the `next` argument is a `noop` - `errors` - errors encountered by `fs.stat` when reading ndes in a directory - `nodes` - an array of `stats` of any type - `files` - `directories` - modification of this array - sorting, removing, etc - affects traversal - `symbolicLinks` - `blockDevices` - `characterDevices` - `FIFOs` - `sockets` **Warning** beware of infinite loops when `followLinks` is true (using `walk-recurse` varient). # Comparisons Tested on my `/System` containing 59,490 (+ self) directories (and lots of files). The size of the text output was 6mb. `find`: time bash -c "find /System -type d | wc" 59491 97935 6262916 real 2m27.114s user 0m1.193s sys 0m14.859s `find.js`: Note that `find.js` omits the start directory time bash -c "node examples/find.js /System -type d | wc" 59490 97934 6262908 # Test 1 real 2m52.273s user 0m20.374s sys 0m27.800s # Test 2 real 2m23.725s user 0m18.019s sys 0m23.202s # Test 3 real 2m50.077s user 0m17.661s sys 0m24.008s In conclusion node.js asynchronous walk is much slower than regular "find". # LICENSE `node-walk` is available under the following licenses: - MIT - Apache 2 Copyright 2011 - Present AJ ONeal