bulk-translate 0.25.2
February 22, 2025 · View on GitHub
Third-party providers hosting↗️
A tiny Python no-string package for performing translation of a massive CSV/JSONL files that
natively provides support of pre-annotated fixed-spans that are invariant for translator.
Description
📘 bulk-translate features
bulk-translate featuresThe out-of-the box features of the bulk-translate are:
- ✅ Support of the
spansfor annotation / optional translation. - ✅ Native Implementation of two translation modes:
fast-mode: exploits extra chars that could be used for grouping all the text parts into single batch with further deconstruction.accurate: performs individual translation of each text part.
- ✅ No strings: you're free to adopt any LM / LLM backend.
- Support
googletransby default.
- Support
Installation
From PyPI:
pip install bulk-translate
or latest version from here:
pip install git+https://github.com/nicolay-r/bulk-translate
Usage
API
👉 Follow this notebook tutorial at nlp-thirdgate
Command Line / Shell
NOTE: Spans supports only in JSON-lines format.
NOTE: Requires
source_iterpackage installation.
For the following test.tsv example data with annotated entities enclosed in square brackets:
python -m bulk_translate.translate \
--src "test/data/test.tsv" \
--schema '{"translated":"{text}"}' \
--adapter "dynamic:models/googletrans_310a.py:GoogleTranslateModel" \
--output "test-translated.jsonl" \
--batch-size 10 \
%%m \
--src "auto" \
--dest "ru"
Powered by
The pipeline construction components were taken from AREkit [github]