README.creole
March 16, 2020 · View on GitHub
== IterFilesystem
Multiprocess directory iteration via {{{os.scandir()}}}
Who's this Lib for?
You want to process a large number of files and/or a few very big files and give feedback to the user on how long it will take.
=== Features:
- Progress indicator: ** Immediately after start: process files and indication of progress via multiprocess ** process bars via [[https://pypi.org/project/tqdm/|tqdm]] ** Estimated time based on file count and size
- Easy to implement extra process bar for big file processing.
- Skip directories and file name via fnmatch.
=== How it works:
The main process starts //statistic// processes in background via Python multiprocess and starts directly with the work.
There are two background //statistic// processes collects information for the process bars:
- Count up all directories and files.
- Accumulates the sizes of all files.
Why two processes?
Because collect only the count of all filesystem items via {{{os.scandir()}}} is very fast. This is the fastest way to predict a processing time.
Use {{{os.DirEntry.stat()}}} to get the file size is significantly slower: It requires another system call.
OK, but why two processed?
Use only the total count of all {{{DirEntry}}} may result in bad estimated time Progress indication. It depends on what the actual work is about: When processing the contents of large files, it is good to know how much total data to be processed.
That's why we used two ways: the {{{DirEntry}}} count to forecast a processing time very quickly and the size to improve the predicted time.
=== requirements:
- Python 3.6 or newer.
- {{{tqdm}}} for process bars
- {{{psutils}}} for setting process priority
- For dev.: [[https://pipenv.readthedocs.io/en/latest/install/#installing-pipenv|Pipenv]]. Packages and virtual environment manager
=== contribute
Please: try, fork and contribute! ;)
| {{https://travis-ci.org/jedie/IterFilesystem.svg|Build Status on travis-ci.org}} | [[https://travis-ci.org/jedie/IterFilesystem/|travis-ci.org/jedie/IterFilesystem]] | | {{https://ci.appveyor.com/api/projects/status/py5sl38ql3xciafc?svg=true|Build Status on appveyor.com}} | [[https://ci.appveyor.com/project/jedie/IterFilesystem/history|ci.appveyor.com/project/jedie/IterFilesystem]] | | {{https://codecov.io/gh/jedie/IterFilesystem/branch/master/graph/badge.svg|Coverage Status on codecov.io}} | [[https://codecov.io/gh/jedie/IterFilesystem|codecov.io/gh/jedie/IterFilesystem]] | | {{https://coveralls.io/repos/jedie/IterFilesystem/badge.svg|Coverage Status on coveralls.io}} | [[https://coveralls.io/r/jedie/IterFilesystem|coveralls.io/r/jedie/IterFilesystem]] | | {{https://requires.io/github/jedie/IterFilesystem/requirements.svg?branch=master|Requirements Status on requires.io}} | [[https://requires.io/github/jedie/IterFilesystem/requirements/|requires.io/github/jedie/IterFilesystem/requirements/]] |
== Example
Use example CLI, e.g.:
{{{ ~ cd IterFilesystem ~/IterFilesystem pipenv shell (IterFilesystem) ~/IterFilesystem pip install -e . ... Successfully installed iterfilesystem
~/IterFilesystem poetry run print_fs_stats --help usage: print_fs_stats.py [-h] [-v] [--debug] [--path PATH] [--skip_dir_patterns [SKIP_DIR_PATTERNS [SKIP_DIR_PATTERNS ...]]] [--skip_file_patterns [SKIP_FILE_PATTERNS [SKIP_FILE_PATTERNS ...]]]
Scan filesystem and print some information
optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit
--debug enable DEBUG
--path PATH The file path that should be scanned e.g.: "/foobar/"
default is ""
--skip_dir_patterns [SKIP_DIR_PATTERNS [SKIP_DIR_PATTERNS ...]]
Directory names to exclude from scan.
--skip_file_patterns [SKIP_FILE_PATTERNS [SKIP_FILE_PATTERNS ...]]
File names to ignore.
}}}
example output looks like this:
{{{
(IterFilesystem) ~/IterFilesystem print_fs_stats --path /IterFilesystem --skip_dir_patterns "." ".egg-info" --skip_file_patterns ".*"
Read/process: '/IterFilesystem'...
Skip directory patterns:
* .*
* *.egg-info
Skip file patterns: * .*
Filesystem items..:Read/process: '~/IterFilesystem'...
...
Filesystem items..: 100%|█████████████████████████████████████████|135/135 13737.14entries/s [00:00<00:00, 13737.14entries/s] File sizes........: 100%|██████████████████████████████████████████████████████████████|843k/843k [00:00<00:00, 88.5MBytes/s] Average progress..: 100%|███████████████████████████████████████████████████████████████████████████████████████|00:00<00:00 Current File......:, /home/jens/repos/IterFilesystem/Pipfile
Processed 135 filesystem items in 0.02 sec SHA515 hash calculated over all file content: 10f9475b21977f5aea1d4657a0e09ad153a594ab30abc2383bf107dbc60c430928596e368ebefab3e78ede61dcc101cb638a845348fe908786cb8754393439ef File count: 109 Total file size: 843.5 KB 6 directories skipped. 6 files skipped. }}}
== History
- [[https://github.com/jedie/IterFilesystem/compare/v1.4.3...master|dev - compare v1.4.3...master]] ** TBC
- [[https://github.com/jedie/IterFilesystem/compare/v1.4.2...v1.4.3|16.03.2020 - v1.4.3]] ** Use logging and remove "verbose mode" ** Nicer "Average progess" bar ** Bugfix "Current File" bar: remove comma
- [[https://github.com/jedie/IterFilesystem/compare/v1.4.1...v1.4.2|16.02.2020 - v1.4.2]] ** iterate over sorted dir entries ** update CI pipelines
- [[https://github.com/jedie/IterFilesystem/compare/v1.4.0...v1.4.1|02.02.2020 - v1.4.1]] ** Bugfix {{{human_filesize}}}
- [[https://github.com/jedie/IterFilesystem/compare/v1.3.1...v1.4.0|02.02.2020 - v1.4.0]] ** {{{stats_helper.abort}}} exists always usefull to get information if KeyboardInterrupt was used ** use poetry and modernize project setup
- [[https://github.com/jedie/IterFilesystem/compare/v1.3.0...v1.3.1|20.10.2019 - v1.3.1]] ** Bugfix if scan directory is completely empty
- [[https://github.com/jedie/IterFilesystem/compare/v1.2.0...v1.3.0|13.10.2019 - v1.3.0]] ** Set ionice and nice priority via psutils
- [[https://github.com/jedie/IterFilesystem/compare/v1.1.0...v1.2.0|13.10.2019 - v1.2.0]] ** Refactor API ** cleanup statistics and process bar ** handle access errors like: //Permission denied// ** fix tests
- [[https://github.com/jedie/IterFilesystem/compare/v1.0.0...v1.1.0|12.10.2019 - v1.1.0]] ** don't create separate process for worker: Just do the work in main process ** dir/file filter uses now {{{fnmatch}}}
- [[https://github.com/jedie/IterFilesystem/compare/v0.2.0...v1.0.0|12.10.2019 - v1.0.0]] ** refactoring: *** don't use {{{persist-queue}}} *** switch from threading to multiprocessing *** enhance progress display with multiple {{{tqdm}}} process bars
- [[https://github.com/jedie/IterFilesystem/compare/v0.1.0...v0.2.0|15.09.2019 - v0.2.0]] ** store persist queue in temp directory ** Don't catch {{{process_path_item}}} errors, this should be made in child class
- [[https://github.com/jedie/IterFilesystem/compare/v0.0.1...v0.1.0|15.09.2019 - v0.1.0]] ** add some project meta files and tests ** setup CI ** fix tests
- [[https://github.com/jedie/IterFilesystem/commit/db89a467a548a969d9d2cdd48adb92114a8833fe|15.09.2019 - v0.0.1]] ** first Release on PyPi
== Links
== Donating
- [[https://www.paypal.me/JensDiemer|paypal.me/JensDiemer]]
- [[https://flattr.com/submit/auto?uid=jedie&url=https%3A%2F%2Fgithub.com%2Fjedie%2FIterFilesystem%2F|Flattr This!]]
- Send [[http://www.bitcoin.org/|Bitcoins]] to [[https://blockexplorer.com/address/1823RZ5Md1Q2X5aSXRC5LRPcYdveCiVX6F|1823RZ5Md1Q2X5aSXRC5LRPcYdveCiVX6F]]