Experiment Manager (exman)

October 10, 2019 ยท View on GitHub

Simple and minimalistic utility to manage many experiments runs and custom analysis of results

Why another custom solution?

My job is to do research in Deep Learning and I have dozens of different experiments. Testing one hypothesis usually required several runs over parameter grid. Plotting and visualizing results is often ad-hoc and updating code producing output is a kind of overhead. Instead I decided to collect all results in Jupyter notebook and create plots kind of interest ~ parameters. As I said, plotting that is a separate task almost every time. Such tools as ModelDB <https://github.com/mitdbg/modeldb>__ provide you with simple visualizations so that they can be easily aggregated for model comparison. Testing a hypothesis is not about model comparison and thus requires special treatment.

Visualizing results became a kind of pain, you had to remember a mapping parameters -> results, separating results into different folders made even more mess. I had really bad experience in visualizations. I got that all I need was to iterate over folder with results and apply the same function to it.

Installation

.. code:: bash

pip install -U git+https://github.com/ferrine/exman.git#egg=exman
# or
pip install exman

Simple Start

Simple drop in replacement of standard argparse.ArgumentParser

.. code:: python

#file: main.py
import exman
# you should always use `exman.simpleroot(__file__)` unless you want another dir
parser = exman.ExParser(root=exman.simpleroot(__file__))  # `root = ./exman` relative to the main file
parser.add_argument(...)

You then just add arguments as you did before without any change.

Best Practices

Error Handling in main


Since 0.0.3 you can use the following context manager. If ``main()``
function fails it will be moved to ``exman/fails``

.. code:: python

    import exman
    # you should always use `exman.simpleroot(__file__)` unless you want another dir
    parser = exman.ExParser(root=exman.simpleroot(__file__))  # `root = ./exman` relative to the main file
    parser.add_argument(...)
    ...
    if __name__ == '__main__':
        args = parser.parse_args()
        with args.safe_experiment:
            # do your stuff
            main(args)


Keep Your Repository Clean

To avoid non reproducible results you can ensure you have commited all changes. Exman will take care and will log hash for the commit and diff if any. To use these features you should hint the parser with the repo.

.. code:: python

import exman

parser = exman.ExParser(root=exman.simpleroot(__file__), git=True)
# less fragile solution, but works only locally
parser = exman.ExParser(root=exman.simpleroot(__file__), git="/abs/path/to/repo")
# an ok solution, if you are sure in the relative path
parser = exman.ExParser(root=exman.simpleroot(__file__),
    git=os.path.join(os.path.dirname(__file__), "relative", "path", "goes", "here"),
    git_assert_clean=True  # run assertion check before each run. False by default.
)

In cli of your favorite experiment you can skip the assertion if you want to:

.. code:: bash

python train.py --git-dirty --other-args

Optional Parameters


To avoid issues in `reproducing experiments <#rerunning-experiment>`__
you should consider using ``exman.optional(type)`` for optional
arguments

.. code:: python

    import exman
    # you should always use `exman.simpleroot(__file__)` unless you want another dir
    parser = exman.ExParser(root=exman.simpleroot(__file__))  # `root = ./exman` relative to the main file
    parser.add_argument('--myarg', type=exman.optional(int))

Validators
~~~~~~~~~~

In simple argparser you cant easily validate multiple arguments, it is
easy in Exman. You can create an informative error message

.. code:: python

    import exman
    # you should always use `exman.simpleroot(__file__)` unless you want another dir
    parser = exman.ExParser(root=exman.simpleroot(__file__))  # `root = ./exman` relative to the main file
    parser.add_argument(...)
    # here `p` stands for initial namespace parsed from arguments
    parser.register_validator(lambda p: p.arg1 != p.arg2 or p.arg3 == p.arg4,
                              # next line will be autoformatted for you using .format
                              'You have provided wrong set of arguments: {arg1}, {arg2}, {arg3}, {arg4}')

Advanced validators can raise `exman.ArgumentError` that contains a better message than the one in validators function

Marry Pandas with Exman

Pandas is a great tool to work with table data. Experiments are the same data and can be loaded in python. So all you need is to run batch of experiments and open a Jupyter notebook.

.. code:: python

import exman
index = exman.Index(exman.simpleroot('/path/to/main.py'))
experiments = index.info()

Table has columns time (datetime64[ns]) of experiment and root (pathlib.Path) path to results. Moreover this table has all other parameters of the experiment. You later can filter/order the results according to them and have easy-breezy access to results folder and it's content.

.. code:: python

for i, ex in experiments.iterrows():
    # do some actions
    # use ex.param for parameters
    # ex.root / 'plot.png' for file paths
    ...

Local Configuration


You can store local configuration files in your experiment folder. You
should provide the filename to ExParser as well.

.. code:: python

    import exman
    # you should always use `exman.simpleroot(__file__)` unless you want another dir
    parser = exman.ExParser(
        root=exman.simpleroot(__file__),
        default_config_files=['local.cfg']
    )

Local configuration stores globally defined default values, they
override defaults set in main file

Auto Structure
~~~~~~~~~~~~~~

If you want argument specific human friendly directory structure you can
tie specific argument names for that

.. code:: python

    import exman
    # you should always use `exman.simpleroot(__file__)` unless you want another dir
    parser = exman.ExParser(
        root=exman.simpleroot(__file__),
        automark=['arg1', 'constant']
    )
    parser.add_argument('--arg1')

Later you can see your `marked folder <#directory-structure-and-cli>`__
looks like this

::

    exman/marked/arg1/<arg1>/constant/<name-of-experiment>/...

This can be usefull if you work in a team. Write in ``main.py``

.. code:: python

    import exman
    # you should always use `exman.simpleroot(__file__)` unless you want another dir
    parser = exman.ExParser(
        root=exman.simpleroot(__file__),
        automark=['user'],
        # store `user: myuser` content in local.cfg
        default_config_files=['local.cfg']
    )
    parser.add_argument('--user')

After you've done that, your team runs can be stored in a single exman
directory assuming all access rights are correctly set up.

::

    exman/marked/user/<username>/constant/<name-of-experiment>/...

Directory Structure and CLI
---------------------------

In command line runs will look also the same:

::

    python main.py --param1 foo --param2 bar

Things change if you actually run the program. It dumps all the parsed
parameters combined with defaults into Yaml style file into location
``root/runs/<name-of-experiment>/params.yaml``. ``name-of-experiment``
is generic and autocreated on the fly. For quick look or search there
are symlinks in the ``index`` folder e.g.
``root/index/<name-of-experiment>.yaml``. Since a lot of experiments are
created and debugging is sometimes needed, you might want not to create
debug experiments in ``runs`` folder. For that case you just add
``--tmp`` flag and new filed will be written to
``root/tmp/<name-of-experiment>`` folder. That is convenient as you both
do not loose important info about experiment and results and can restore
these symlinks in index by hand if needed.

::

    root
    |-- runs
    |   `-- xxxxxx-YYYY-mm-dd-HH-MM-SS
    |       |-- params.yaml
    |       `-- ...
    |-- fails
    |-- index
    |   `-- xxxxxx-YYYY-mm-dd-HH-MM-SS.yaml (symlink)
    |-- marked
    |   `-- <mark>
    |       `-- xxxxxx-YYYY-mm-dd-HH-MM-SS (symlink)
    |           |-- params.yaml
    |           `-- ...
    `-- tmp
        `-- xxxxxx-YYYY-mm-dd-HH-MM-SS
            |-- params.yaml
            `-- ...

Rerunning experiment

If you want to reproduce an experiment, you can provide source configuration file in yaml format. For example:

.. code:: bash

python main.py --config root/index/<name-of-experiment-to-reproduce>.yaml

All the values will be restored from the previous run. You can also modify old values in --config ... using

.. code:: bash

python main.py --config root/index/<name-of-experiment-to-reproduce>.yaml --override-param=new_value

In case you do not want to restore some argument from saved config (it may be some dynamic setted variable) you should use volatile=True in add_argument:

.. code:: python

parser.add_argument('--my_dynamic_id', default=os.environ.get('AUTOSETTED_ID'), volatile=True)

Marking experiments

If you like some experiments you can mark them for easier later access.

::

cd root_of_exman_dir
exman mark <key> <#ex1> [<#ex2> <#ex3> ...]

and later in Jupyter

.. code:: python

index = exman.Index(exman.simpleroot('/path/to/main.py'))
experiments = index.info('<key>')
# assuming you work in a team and use best practice advice
user_experiments = index.info('user/username')

Deleting experiments

::

cd root_of_exman_dir
# delete only index
exman delete <#ex1> [<#ex2> <#ex3> ...]
# delete all files
exman delete --all <#ex1> [<#ex2> <#ex3> ...]