README.org

November 19, 2017 ยท View on GitHub

** Installation

Create Python 3 virtual environment and install slob.py as described at http://github.com/itkach/slob/.

In this virtual environment run

#+BEGIN_SRC sh pip install git+https://github.com/itkach/tei2slob.git #+END_SRC

** Usage

[[http://sourceforge.net/projects/freedict/files/][Download]] a dictionary archive and unpack it. For example:

#+BEGIN_SRC sh wget http://downloads.sourceforge.net/project/freedict/English%20-%20German/0.3.6/freedict-eng-deu-0.3.6.src.tar.bz2 tar -xvf freedict-eng-deu-0.3.6.src.tar.bz2 #+END_SRC

Then run converter:

#+BEGIN_SRC sh tei2slob eng-deu/eng-deu.tei #+END_SRC

eng-deu-0.3.6.slob will be created in the same directory.

Converter attempts to populate dictionary tags based on information in .tei header section, but it may fail because the way some elements (like license name) is not standardized and varies across dictionaries, so be sure to check the tags:

#+BEGIN_SRC sh slob info eng-deu-0.3.6.slob #+END_SRC

Set tag values as necessary, for example:

#+BEGIN_SRC sh slob tag -n license.name -v "GNU General Public License" eng-deu-0.3.6.slob slob tag -n license.url -v "http://www.gnu.org/licenses/gpl.html" eng-deu-0.3.6.slob slob tag -n created.by -v me@example.com eng-deu-0.3.6.slob #+END_SRC

uri is an important tag. When different dictionaries have the same uri it means they contain keys belonging to the same logical dictionary. So when compiling a new version of existing dictionary make sure uri remains the same.

#+BEGIN_SRC sh

usage: tei2slob [-h] [-o OUTPUT_FILE] [-c {lzma2,zlib}] [-b BIN_SIZE] [-a CREATED_BY] [-w WORK_DIR] input_file

positional arguments: input_file TEI file name

optional arguments: -h, --help show this help message and exit -o OUTPUT_FILE, --output-file OUTPUT_FILE Name of output slob file -c {lzma2,zlib}, --compression {lzma2,zlib} Name of compression to use. Default: zlib -b BIN_SIZE, --bin-size BIN_SIZE Minimum storage bin size in kilobytes. Default: 256 -a CREATED_BY, --created-by CREATED_BY Value for created.by tag. Identifier (e.g. name or email) for slob file creator -w WORK_DIR, --work-dir WORK_DIR Directory for temporary files created during compilation. Default: .

#+END_SRC