README.creole

March 16, 2020 · View on GitHub

== IterFilesystem

Multiprocess directory iteration via {{{os.scandir()}}}

Who's this Lib for?

You want to process a large number of files and/or a few very big files and give feedback to the user on how long it will take.

=== Features:

  • Progress indicator: ** Immediately after start: process files and indication of progress via multiprocess ** process bars via [[https://pypi.org/project/tqdm/|tqdm]] ** Estimated time based on file count and size
  • Easy to implement extra process bar for big file processing.
  • Skip directories and file name via fnmatch.

=== How it works:

The main process starts //statistic// processes in background via Python multiprocess and starts directly with the work.

There are two background //statistic// processes collects information for the process bars:

  • Count up all directories and files.
  • Accumulates the sizes of all files.

Why two processes?

Because collect only the count of all filesystem items via {{{os.scandir()}}} is very fast. This is the fastest way to predict a processing time.

Use {{{os.DirEntry.stat()}}} to get the file size is significantly slower: It requires another system call.

OK, but why two processed?

Use only the total count of all {{{DirEntry}}} may result in bad estimated time Progress indication. It depends on what the actual work is about: When processing the contents of large files, it is good to know how much total data to be processed.

That's why we used two ways: the {{{DirEntry}}} count to forecast a processing time very quickly and the size to improve the predicted time.

=== requirements:

=== contribute

Please: try, fork and contribute! ;)

| {{https://travis-ci.org/jedie/IterFilesystem.svg|Build Status on travis-ci.org}} | [[https://travis-ci.org/jedie/IterFilesystem/|travis-ci.org/jedie/IterFilesystem]] | | {{https://ci.appveyor.com/api/projects/status/py5sl38ql3xciafc?svg=true|Build Status on appveyor.com}} | [[https://ci.appveyor.com/project/jedie/IterFilesystem/history|ci.appveyor.com/project/jedie/IterFilesystem]] | | {{https://codecov.io/gh/jedie/IterFilesystem/branch/master/graph/badge.svg|Coverage Status on codecov.io}} | [[https://codecov.io/gh/jedie/IterFilesystem|codecov.io/gh/jedie/IterFilesystem]] | | {{https://coveralls.io/repos/jedie/IterFilesystem/badge.svg|Coverage Status on coveralls.io}} | [[https://coveralls.io/r/jedie/IterFilesystem|coveralls.io/r/jedie/IterFilesystem]] | | {{https://requires.io/github/jedie/IterFilesystem/requirements.svg?branch=master|Requirements Status on requires.io}} | [[https://requires.io/github/jedie/IterFilesystem/requirements/|requires.io/github/jedie/IterFilesystem/requirements/]] |

== Example

Use example CLI, e.g.:

{{{ ~gitclonehttps://github.com/jedie/IterFilesystem.git  git clone https://github.com/jedie/IterFilesystem.git ~ cd IterFilesystem ~/IterFilesystempipenvinstall /IterFilesystem pipenv install ~/IterFilesystem pipenv shell (IterFilesystem) ~/IterFilesystemprintfsstatshelp(IterFilesystem) /IterFilesystem print_fs_stats --help (IterFilesystem) ~/IterFilesystem pip install -e . ... Successfully installed iterfilesystem

~/IterFilesystem poetry run print_fs_stats --help usage: print_fs_stats.py [-h] [-v] [--debug] [--path PATH] [--skip_dir_patterns [SKIP_DIR_PATTERNS [SKIP_DIR_PATTERNS ...]]] [--skip_file_patterns [SKIP_FILE_PATTERNS [SKIP_FILE_PATTERNS ...]]]

Scan filesystem and print some information

optional arguments: -h, --help show this help message and exit -v, --version show program's version number and exit --debug enable DEBUG --path PATH The file path that should be scanned e.g.: "/foobar/" default is "" --skip_dir_patterns [SKIP_DIR_PATTERNS [SKIP_DIR_PATTERNS ...]] Directory names to exclude from scan. --skip_file_patterns [SKIP_FILE_PATTERNS [SKIP_FILE_PATTERNS ...]] File names to ignore. }}}

example output looks like this:

{{{ (IterFilesystem) ~/IterFilesystem print_fs_stats --path /IterFilesystem --skip_dir_patterns "." ".egg-info" --skip_file_patterns ".*" Read/process: '/IterFilesystem'... Skip directory patterns: * .* * *.egg-info

Skip file patterns: * .*

Filesystem items..:Read/process: '~/IterFilesystem'...

...

Filesystem items..: 100%|█████████████████████████████████████████|135/135 13737.14entries/s [00:00<00:00, 13737.14entries/s] File sizes........: 100%|██████████████████████████████████████████████████████████████|843k/843k [00:00<00:00, 88.5MBytes/s] Average progress..: 100%|███████████████████████████████████████████████████████████████████████████████████████|00:00<00:00 Current File......:, /home/jens/repos/IterFilesystem/Pipfile

Processed 135 filesystem items in 0.02 sec SHA515 hash calculated over all file content: 10f9475b21977f5aea1d4657a0e09ad153a594ab30abc2383bf107dbc60c430928596e368ebefab3e78ede61dcc101cb638a845348fe908786cb8754393439ef File count: 109 Total file size: 843.5 KB 6 directories skipped. 6 files skipped. }}}

== History

== Links

== Donating