Command Line Interface

October 16, 2023 ยท View on GitHub

The weasel CLI includes subcommands for working with Weasel projects, end-to-end workflows for building and deploying custom pipelines.

:clipboard: clone

Clone a project template from a Git repository. Calls into git under the hood and can use the sparse checkout feature if available, so you're only downloading what you need. By default, Weasel's project templates repo is used, but you can provide any other repo (public or private) that you have access to using the --repo option.

python -m weasel clone [name] [dest] [--repo] [--branch] [--sparse]

:bulb: Example usage

$ python -m weasel clone pipelines/ner_wikiner
NameDescription
nameThe name of the template to clone, relative to the repo. Can be a top-level directory or a subdirectory like dir/template. str (positional)
destWhere to clone the project. Defaults to current working directory. Path (positional)
--repo, -rThe repository to clone from. Can be any public or private Git repo you have access to. str (option)
--branch, -bThe branch to clone from. Defaults to master. str (option)
--sparse, -SEnable sparse checkout to only check out and download what's needed. Requires Git v22.2+. bool (flag)
--help, -hShow help message and available arguments. bool (flag)
CREATESThe cloned project directory.

:open_file_folder: assets

Fetch project assets like datasets and pretrained weights. Assets are defined in the assets section of the project.yml. If a checksum is provided, the file is only downloaded if no local file with the same checksum exists and Weasel will show an error if the checksum of the downloaded file doesn't match. If assets don't specify a url they're considered "private" and you have to take care of putting them into the destination directory yourself. If a local path is provided, the asset is copied into the current project.

python -m weasel assets [project_dir]
NameDescription
project_dirPath to project directory. Defaults to current working directory. Path (positional)
--extra, -e 3.3.1Download assets marked as "extra". Default false. bool (flag)
--sparse, -SEnable sparse checkout to only check out and download what's needed. Requires Git v22.2+. bool (flag)
--help, -hShow help message and available arguments. bool (flag)
CREATESDownloaded or copied assets defined in the project.yml.

:rocket: run

Run a named command or workflow defined in the project.yml. If a workflow name is specified, all commands in the workflow are run, in order. If commands define dependencies or outputs, they will only be re-run if state has changed. For example, if the input dataset changes, a preprocessing command that depends on those files will be re-run.

python -m weasel run [subcommand] [project_dir] [--force] [--dry]
NameDescription
subcommandName of the command or workflow to run. str (positional)
project_dirPath to project directory. Defaults to current working directory. Path (positional)
--force, -FForce re-running steps, even if nothing changed. bool (flag)
--dry, -DPerform a dry run and don't execute scripts. bool (flag)
--help, -hShow help message and available arguments. bool (flag)
EXECUTESThe command defined in the project.yml.

:arrow_up: push

Upload all available files or directories listed as in the outputs section of commands to a remote storage. Outputs are archived and compressed prior to upload, and addressed in the remote storage using the output's relative path (URL encoded), a hash of its command string and dependencies, and a hash of its file contents. This means push should never overwrite a file in your remote. If all the hashes match, the contents are the same and nothing happens. If the contents are different, the new version of the file is uploaded. Deleting obsolete files is left up to you.

Remotes can be defined in the remotes section of the project.yml. Under the hood, Weasel uses cloudpathlib to communicate with the remote storages, so you can use any protocol that CloudPath supports, including S3, Google Cloud Storage, and the local filesystem, although you may need to install extra dependencies to use certain protocols.

python -m weasel push [remote] [project_dir]

:bulb: Example

$ python -m weasel push my_bucket
remotes:
  my_bucket: 's3://my-weasel-bucket'
NameDescription
remoteThe name of the remote to upload to. Defaults to "default". str (positional)
project_dirPath to project directory. Defaults to current working directory. Path (positional)
--help, -hShow help message and available arguments. bool (flag)
UPLOADSAll project outputs that exist and are not already stored in the remote.

:arrow_down: pull

Download all files or directories listed as outputs for commands, unless they are already present locally. When searching for files in the remote, pull won't just look at the output path, but will also consider the command string and the hashes of the dependencies. For instance, let's say you've previously pushed a checkpoint to the remote, but now you've changed some hyper-parameters. Because you've changed the inputs to the command, if you run pull, you won't retrieve the stale result. If you train your pipeline and push the outputs to the remote, the outputs will be saved alongside the prior outputs, so if you change the config back, you'll be able to fetch back the result.

Remotes can be defined in the remotes section of the project.yml. Under the hood, Weasel uses cloudpathlib to communicate with the remote storages, so you can use any protocol that CloudPath supports, including S3, Google Cloud Storage, and the local filesystem, although you may need to install extra dependencies to use certain protocols.

python -m weasel pull [remote] [project_dir]

:bulb: Example

$ python -m weasel pull my_bucket
remotes:
  my_bucket: 's3://my-weasel-bucket'
NameDescription
remoteThe name of the remote to download from. Defaults to "default". str (positional)
project_dirPath to project directory. Defaults to current working directory. Path (positional)
--help, -hShow help message and available arguments. bool (flag)
DOWNLOADSAll project outputs that do not exist locally and can be found in the remote.

:closed_book: document

Auto-generate a pretty Markdown-formatted README for your project, based on its project.yml. Will create sections that document the available commands, workflows and assets. The auto-generated content will be placed between two hidden markers, so you can add your own custom content before or after the auto-generated documentation. When you re-run the project document command, only the auto-generated part is replaced.

python -m weasel document [project_dir] [--output] [--no-emoji]

:bulb: Example usage

$ python -m weasel document --output README.md

For more examples, see the templates in our projects repo.

NameDescription
project_dirPath to project directory. Defaults to current working directory. Path (positional)
--output, -oPath to output file or - for stdout (default). If a file is specified and it already exists and contains auto-generated docs, only the auto-generated docs section is replaced. Path (positional)
--no-emoji, -NEDon't use emoji in the titles. bool (flag)
CREATESThe Markdown-formatted project documentation.

:repeat: dvc

Auto-generate Data Version Control (DVC) config file. Calls dvc run with --no-exec under the hood to generate the dvc.yaml. A DVC project can only define one pipeline, so you need to specify one workflow defined in the project.yml. If no workflow is specified, the first defined workflow is used. The DVC config will only be updated if the project.yml changed. For details, see the DVC integration docs.

Warning

This command requires DVC to be installed and initialized in the project directory, e.g. via dvc init. You'll also need to add the assets you want to track with dvc add.

python -m weasel dvc [project_dir] [workflow] [--force] [--verbose] [--quiet]

:bulb: Example

$ git init
$ dvc init
$ python -m weasel dvc all
NameDescription
project_dirPath to project directory. Defaults to current working directory. Path (positional)
workflowName of workflow defined in project.yml. Defaults to first workflow if not set. Optional[str] (option)
--force, -FForce-updating config file. bool (flag)
--verbose, -VPrint more output generated by DVC. bool (flag)
--quiet, -qPrint no output generated by DVC. bool (flag)
--help, -hShow help message and available arguments. bool (flag)
CREATESA dvc.yaml file in the project directory, based on the steps defined in the given workflow.