Streamlining Data-Intensive Biology With Workflow Systems
November 16, 2020 ยท View on GitHub
Accepted Manuscript
bioRxiv preprint (initially preprinted 07/01/2020)
Code of Conduct
This project operates under a code of conduct. Participating in the project in any way (issues, pull requests, gitter, or other media) indicates that you agree that you will follow the code of conduct. We take this very seriously. If you experience harassment or notice violations of the code of conduct, please raise the issue to one of the project organizers (@taylorreiter or @bluegenes).
Project Description
As the scale of biological data generation has increased, the bottleneck of research has shifted from data generation to analysis. Researchers commonly need to build computational workflows that include multiple analytic tools and require incremental development as experimental insights demand tool and parameter modifications. Data-centric workflow systems can alleviate some of these challenges, but knowledge of and training in these techniques is still lacking. Our goal is to generate a helpful set of strategies for leveraging workflow systems to streamline large-scale biological analyses.
Our initial version has been much improved through iterations of feedback primarily from members and friends of the DIB-lab. While the practices are written with specific examples for high-throughput sequencing data, we hope many of the perspectives and guidance provided by the document apply more generally to all workflow-enabled biology.
This repository is a living document (written with manubot) that aims to consolidate and integrate helpful information about workflow systems and their applications in data-intensive biology. We welcome constructive feedback from workflow-enabled biologists of all levels anywhere in the world.
Contributions
You'll need to make a free GitHub account.
Instructions and procedures for contributing are outlined here.
We will follow the ICMJE Guidelines for determining authorship.
Pull Requests
If you are not familiar with git and GitHub, you can use these directions to start contributing.
Please feel encouraged to ask questions by opening a Request for Help issue
This project is a collaborative effort that will benefit from the expertise of scientists across a wide range of workflow applications!
Manubot
Manubot is a system for writing scholarly manuscripts via GitHub.
Manubot automates citations and references, versions manuscripts using git, and enables collaborative writing via GitHub.
An overview manuscript presents the benefits of collaborative writing with Manubot and its unique features.
The rootstock repository is a general purpose template for creating new Manubot instances, as detailed in SETUP.md.
See USAGE.md for documentation how to write a manuscript.
Please open an issue for questions related to Manubot usage, bug reports, or general inquiries.
Repository directories & files
- This file is called
README.mdIt is the centralized document for the repository and will help direct users to other relevant information. CONTRIBUTING.mdcontains procedures and directions for contributing to this effort.INSTRUCTIONS.mdcontains instructions for new GitHub users for how to navigate GitHub in the browser as well as GitHub vocabulary. It also includes some instructions for more experienced users about the procedures we recommend and how to run manubot on the command line.USAGE.mddescribes formatting instructions for formatting text, citing references, adding figures and tables, etc.SETUP.mdincludes information about setting up manubotLICENSE.mdandLICENSE-CC0.mdcontain the licenses associated with manubot and with the content we are developing in this project. Please see the "License" section below.
The directories are as follows:
contentcontains the manuscript source, which includes markdown files as well as inputs for citations and references. These are the files that most contributors will be editing. SeeUSAGE.mdfor more information.outputcontains the outputs (generated files) from Manubot including the resulting manuscripts. You should not edit these files manually, because they will get overwritten.webpageis a directory meant to be rendered as a static webpage for viewing the HTML manuscript.buildcontains commands and tools for building the manuscript.cicontains files necessary for deployment via continuous integration.
License
Except when noted otherwise, the entirety of this repository is licensed under a CC BY 4.0 License (LICENSE.md), which allows reuse with attribution.
Please attribute by linking to https://github.com/dib-lab/2020-workflows-paper.
Since CC BY is not ideal for code and data, certain repository components are also released under the CC0 1.0 public domain dedication (LICENSE-CC0.md).
All files matched by the following glob patterns are dual licensed under CC BY 4.0 and CC0 1.0:
*.sh*.py*.yml/*.yaml*.json*.bib*.tsv.gitignore
All other files are only available under CC BY 4.0, including:
*.md*.html*.pdf*.docx
Please open an issue for any question related to licensing.
Attribution
Many of the documents (especially *.md documents) and issues presented in this repository were modified from another manubot repository.