LLM Customization
January 17, 2024 · View on GitHub
We provide a user-friendly way to evaluate your own model.
Evaluate Your Model
Step 1: Add your model configs
You should add your model config to eval_configs/main_results_all_tasks.yaml, for example:
codellama-34b
name: vllm
engine: [PATH TO YOUR MODEL]
max_tokens: 100
temperature: 0
top_p: 1
stop:
context_length: 16384
dtype: float32
ngpu: 4
use_parser: True
Arguments for the configs are as follows,
name: name of the inference framework (e.g.,vllm,hg,gpt,gpt_azure,claude)engine: path to your model or the huggingface model namemax_tokens: the max number of newly generated tokenstemperature: temperature for generationtop_p: top- for generationstop: stop tokens for generationcontext_length: the maximum context length of the LLMdtype: float32 or float16ngpu: this argument works for the vllm frameworkusr_parser: bool, post-process the generated actions
Step 2: Inference your model with vLLM or huggingface
Please check whether your model is supported by vLLM.
We recommend to use vLLM because it is usually much faster than naive model.generate() in huggingface.
If you decide to use the vLLM, you should set the argument of name above as vllm, otherwise hg.
Note: The results of inferencing with huggingface and vLLM can be different because their different implementations.
Step 3: Write the prompt_templates of your model
If your model need customized input template, you should write it in agentboard/prompts/prompt_template, for example:
"codellama-34b":
"""
<s>[INST]{system_prompt}{prompt}[/INST]
""",
Arguments for this template are as follows,
system_prompt: system prompt of your agentprompt: the user prompt of your agent