WalkDir

Module author: Nick Coghlan <ncoghlan@gmail.com>

The standard libary’s os.walk() iterator provides a convenient way to process the contents of a filesystem directory. This module provides higher level tools based on the same interface that support filtering, depth limiting and handling of symlink loops. The module also offers tools that flatten the os.walk() API into a simple iteration over filesystem paths.

Walk Iterables

In this module, walk_iter refers to any iterable that produces path, subdirs, files triples of the style produced by os.walk().

The module is designed so that all purely filtering operations preserve the output of the underlying iterable. This means that named tuples, tuples containing more than 3 values, or objects that aren’t tuples at all but are still defined such that x[0], x[1], x[2] => dirpath, subdirs, files can be filtered without being converted to ordinary 3-tuples.

Changed in version 0.3: Objects produced by underlying iterables are now preserved instead of being coerced to ordinary 3-tuples by filtering operations

Path Iteration

Three iterators are provided for iteration over filesystem paths:

all_paths(walk_iter)[source]

Iterate over both files and directories visited by the underlying walk

dir_paths(walk_iter)[source]

Iterate over just the directory names visited by the underlying walk

file_paths(walk_iter)[source]

Iterate over the files in directories visited by the underlying walk

Directory Walking

A convenience API for walking directories with various options is provided:

filtered_walk(top, included_files=None, included_dirs=None, excluded_files=None, excluded_dirs=None, depth=None, followlinks=False, min_depth=None)[source]

This is a wrapper around os.walk(), with these additional features: - top may be either a string (which will be passed to os.walk())

or any iterable that produces path, subdirs, files triples
  • allows independent glob-style filters for filenames and subdirectories
  • allows a recursion depth limit to be specified
  • emits a message to stderr and skips the directory if a symlink loop is encountered when following links

Filtered walks are always top down, as the subdirectory listings must be altered to provide a number of the above features.

include_files, include_dirs, exclude_files and exclude_dirs are used to apply the relevant filtering steps to the walk.

A depth of None (the default) disables depth limiting. Otherwise, depth must be at least zero and indicates how far to descend into the directory hierarchy. A depth of zero is useful to get separate filtered subdirectory and file listings for top.

Setting min_depth allows directories higher in the tree to be excluded from the walk (e.g. a min_depth of 1 excludes top, but any subdirectories will still be processed)

followlinks enables symbolic loop detection and is also passed to os.walk() when top is a string

The individual operations that support the convenience API are exposed using an itertools style iterator pipeline model:

include_dirs(walk_iter, *include_filters)[source]

Use fnmatch.fnmatch() patterns to select directories of interest

Inclusion filters are passed directly as arguments

include_files(walk_iter, *include_filters)[source]

Use fnmatch.fnmatch() patterns to select files of interest

Inclusion filters are passed directly as arguments

exclude_dirs(walk_iter, *exclude_filters)[source]

Use fnmatch.fnmatch() patterns to skip irrelevant directories

Exclusion filters are passed directly as arguments

exclude_files(walk_iter, *exclude_filters)[source]

Use fnmatch.fnmatch() patterns to skip irrelevant files

Exclusion filters are passed directly as arguments

limit_depth(walk_iter, depth)[source]

Limit the depth of recursion into subdirectories.

A depth of 0 limits the walk to the top level directory, a depth of 1 includes subdirectories, etc.

Path depth is calculated by counting directory separators, using the depth of the first path produced by the underlying iterator as a reference point.

min_depth(walk_iter, depth)[source]

Only process subdirectories beyond a minimum depth

A depth of 1 omits the top level directory, a depth of 2 starts with subdirectories 2 levels down, etc.

Path depth is calculated by counting directory separators, using the depth of the first path produced by the underlying iterator as a reference point.

NOTE: Since this filter doesn’t yield higher level directories, any subsequent directory filtering that relies on updating the subdirectory list will have no effect at the minimum depth. Accordingly, this filter should only be applied after any directory filtering operations.

Handle symlink loops when following symlinks during a walk

By default, prints a warning and then skips processing the directory a second time.

This can be overridden by providing the onloop callback, which accepts the offending symlink as a parameter. Returning a true value from this callback will mean that the directory is still processed, otherwise it will be skipped.

Examples

Here are some simple examples of the module being used to explore the contents of its own source tree:

>>> from walkdir import filtered_walk, dir_paths, all_paths, file_paths
>>> files = file_paths(filtered_walk('.', depth=0,
...                    included_files=['*.py', '*.txt', '*.rst']))
>>> print '\n'.join(files)
./setup.py
./walkdir.py
./NEWS.rst
./test_walkdir.py
./LICENSE.txt
./VERSION.txt
./README.txt
>>> dirs = dir_paths(filtered_walk('.', depth=1, min_depth=1,
...                  excluded_dirs=['__pycache__', '.hg']))
>>> print '\n'.join(dirs)
./docs
./dist
>>> paths = all_paths(filtered_walk('.', depth=1,
...                   included_files=['*.py', '*.txt', '*.rst'],
...                   excluded_dirs=['__pycache__', '.hg']))))
>>> print '\n'.join(paths)
.
./setup.py
./walkdir.py
./NEWS.rst
./test_walkdir.py
./LICENSE.txt
./VERSION.txt
./README.txt
./docs
./docs/index.rst
./docs/conf.py
./dist

Obtaining the Module

This module can be installed directly from the Python Package Index with pip:

pip install walkdir

Alternatively, you can download and unpack it manually from the walkdir PyPI page.

There are no operating system or distribution specific versions of this module - it is a pure Python module that should work on all platforms.

Supported Python versions are 2.6, 2.7 and 3.1+.

Development and Support

WalkDir is developed and maintained on BitBucket, with continuous integration services provided by Shining Panda.

Problems and suggested improvements can be posted to the issue tracker.

Release History

0.3 (2012-01-31)

  • Issue #7: filter functions now pass the tuples created by underlying iterators through without modification, using indexing rather than tuple unpacking to access values of interest. This means WalkDir now supports any underlying iterable that produces items where x[0], x[1], x[2] refers to dirpath, subdirs, files. For example, if the the iterable produces collections.namedtuple instances, those will be passed through to the output of a filtered walk.

0.2.1 (2012-01-17)

  • Add MANIFEST.in so PyPI package contains all relevant files

0.2 (2012-01-04)

  • Issue #6: Added a min_depth option to filtered_walk and a new min_depth filter function to make it easier to produce a list of full subdirectory paths

  • Issue #5: Renamed path iteration convenience APIs:
    • iter_paths -> all_paths
    • iter_dir_paths -> dir_paths
    • iter_file_paths -> file_paths
  • Moved version number to a VERSION.txt file (read by both docs and setup.py)

  • Added NEWS.rst (and incorporated into documentation)

0.1 (2011-11-13)

  • Initial release

Indices and tables

Project Versions

Table Of Contents

This Page