Build SceneInstruct Dataset
November 29, 2024 ยท View on GitHub
Please follow the instructions below to reproduce the procedure of building SceneInstruct.
Model Preparation
- Llama-3.1-70B-Instruct: You can download the weights of Llama-3.1-70B-Instruct at HF Repo. To serve Llama-3.1-70B-Instruct with vLLM:
vllm serve <Llama-3.1-70B path> --tensor_parallel_size 2 - OpenAI API key: Create a file
openai_keyand add your API key.
Create Scene Descriptions with Evol-Instruct
- Deploy Llama-3.1-70B-Instruct following Model Preparation.
- Set
<model-checkpoint-path>in create_descriptions.py to your model path. - Run the following command:
python create_descriptions.py \ --num-prompts-needed 3000 # the number of new descriptions to be created - The generated descriptions are saved in
data_prompt.jsonlby default.
Collect SceneGenAgent Trajectories
- Deploy the models following Model Preparation.
- Set
<model-checkpoint-path>in create_descriptions.py to your model path. - Run the following command:
python collect_before_assign_placement.py python collect_assign_placement.py - The generated SceneInstruct dataset is saved in three files:
data_prompt_assign_placement.jsonl,data_prompt_check_positional_error.jsonl, anddata_prompt_fix_positional_error.jsonl.