README.org
November 19, 2017 ยท View on GitHub
- tei2slob This is a tool to convert [[http://www.tei-c.org/][TEI]] [[http://www.tei-c.org/release/doc/tei-p5-doc/en/html/DI.html][P5]] dictionaries to [[https://github.com/itkach/slob][slob]] format. Some free [[http://www.tei-c.org/][TEI]] [[http://www.tei-c.org/release/doc/tei-p5-doc/en/html/DI.html][P5]] dictionaries are available at http://freedict.org/
** Installation
Create Python 3 virtual environment and install slob.py as described at http://github.com/itkach/slob/.
In this virtual environment run
#+BEGIN_SRC sh pip install git+https://github.com/itkach/tei2slob.git #+END_SRC
** Usage
[[http://sourceforge.net/projects/freedict/files/][Download]] a dictionary archive and unpack it. For example:
#+BEGIN_SRC sh wget http://downloads.sourceforge.net/project/freedict/English%20-%20German/0.3.6/freedict-eng-deu-0.3.6.src.tar.bz2 tar -xvf freedict-eng-deu-0.3.6.src.tar.bz2 #+END_SRC
Then run converter:
#+BEGIN_SRC sh tei2slob eng-deu/eng-deu.tei #+END_SRC
eng-deu-0.3.6.slob will be created in the same directory.
Converter attempts to populate dictionary tags based on information
in .tei header section, but it may fail because the way some elements
(like license name) is not standardized and varies across
dictionaries, so be sure to check the tags:
#+BEGIN_SRC sh slob info eng-deu-0.3.6.slob #+END_SRC
Set tag values as necessary, for example:
#+BEGIN_SRC sh slob tag -n license.name -v "GNU General Public License" eng-deu-0.3.6.slob slob tag -n license.url -v "http://www.gnu.org/licenses/gpl.html" eng-deu-0.3.6.slob slob tag -n created.by -v me@example.com eng-deu-0.3.6.slob #+END_SRC
uri is an important tag. When different dictionaries have the
same uri it means they contain keys belonging to the same
logical dictionary. So when compiling a new version of existing
dictionary make sure uri remains the same.
#+BEGIN_SRC sh
usage: tei2slob [-h] [-o OUTPUT_FILE] [-c {lzma2,zlib}] [-b BIN_SIZE] [-a CREATED_BY] [-w WORK_DIR] input_file
positional arguments: input_file TEI file name
optional arguments: -h, --help show this help message and exit -o OUTPUT_FILE, --output-file OUTPUT_FILE Name of output slob file -c {lzma2,zlib}, --compression {lzma2,zlib} Name of compression to use. Default: zlib -b BIN_SIZE, --bin-size BIN_SIZE Minimum storage bin size in kilobytes. Default: 256 -a CREATED_BY, --created-by CREATED_BY Value for created.by tag. Identifier (e.g. name or email) for slob file creator -w WORK_DIR, --work-dir WORK_DIR Directory for temporary files created during compilation. Default: .
#+END_SRC