LMDataformat [](https://travis-ci.org/leogao2/lmdataformat) [](https://coveralls.io/github/leogao2/lm_dataformat?branch=master)

September 14, 2020 ยท View on GitHub

Utilities for storing data for LM training.

Basic Usage

To write:

ar = Archive('output_dir')

for x in something():
  # do other stuff
  ar.add_data(somedocument, meta={
    'example': stuff,
    'someothermetadata': [othermetadata, otherrandomstuff],
    'otherotherstuff': True
  })

# remember to commit at the end!
ar.commit()

To read:

rdr = Reader('input_dir_or_file')

for doc in rdr.stream_data():
  # do something with the document