Dataset: Python-code-docstring [](https://doi.org/10.5281/zenodo.7202649)
November 27, 2023 ยท View on GitHub
For Python dataset, its original codes are not executable in python3. An optional way to deal with such problem is that we can acquire runnable Python codes from raw data.
Step 1: Download pre-processed and raw (python_wan) dataset.
bash dataset/python_wan/download.sh
Step 2: Clean raw code files.
python -m dataset.python_wan.clean
Step 3: Move code/code_tokens/docstring/docstring_tokens to ~/python_wan/flatten/*.
python -m dataset.python_wan.attributes_cast
Step 4 (optional): Or you can download our processed Python(Wan) dataset
bash dataset/python_wan/lazy_download.sh