:paperclip: How to configure MINT to evaluate your LLM
September 20, 2023 ยท View on GitHub
Add an evaluated LLM
MINT use different class to abstract out the API of different LLMs. You can find the list of implemented LLMs in mint/agents/__init__.py.
For closed-source models:
OpenAILMAgentfor OpenAI API.BardLMAgentfor Bard.ClaudeLMAgentfor Anthropic Claude.
For open-source models, we have VLLMAgent that can be used to evaluate any LLMs that can be served with VLLM or FastChat into an OpenAI-compatible API.
If you want to evaluate an open-source LLM that can be served with VLLM or FastChat: First, refer to docs/SERVING.md to learn about how to serve your model. Then, modify mint/configs/config_variables.py by adding a dictionary describing the model to be evaluated into EVALUATED_MODEL_LIST.
# For Chat Model
{
"agent_class": "VLLMAgent",
"config": {
"model_name": "<YOUR_MODEL_NAME>",
"chat_mode": True,
"max_tokens": 512,
"temperature": 0.0,
"openai.api_base": "<YOUR_API_BASE>",
"add_system_message": False,
},
}
# For Completion-only Model
{
"agent_class": "VLLMAgent",
"config": {
"model_name": "Llama-2-70b-hf",
"chat_mode": False,
"max_tokens": 512,
"temperature": 0.0,
"openai.api_base": "<YOUR_API_BASE>",
"add_system_message": False,
},
},
If you want to evaluate another closed-source LLM with a different API schema than the existing implementation: You need to implement a new agent class that inherits from LMAgent (PR welcomed!).
You can use mint/agents/openai_lm_agent.py as an example, then add this model configuration to mint/configs/config_variables.py similar to the above.
Add a feedback-providing LLM
We implemented three different feedback agent classes:
If you want to use an existing open-source model compatible with VLLM or FastChat, you can add a configuration similar to the above to FEEDBACK_PROVIDER_LIST in mint/configs/config_variables.py.
FEEDBACK_PROVIDER_LIST = [
...
{
"agent_class": "VLLMFeedbackAgent",
"model_name": "<YOUR_MODEL_NAME>",
"openai.api_base": "<YOUR_API_BASE>",
"chat_mode": True, # Set to False if your model is completion-only
},
...
]
If needed, you can use these classes as an example to implement your own feedback agent class (PR welcomed!). Then, add this model configuration to FEEDBACK_PROVIDER_LIST in mint/configs/config_variables.py. For example:
FEEDBACK_PROVIDER_LIST = [
...
{
# Your custom feedback provider
"agent_class": "<YOUR_FEEDBACK_AGENT_CLASS>",
"model_name": "<YOUR_FEEDBACK_MODEL_NAME>",
},
...
]
Change Experiment Configurations
Optionally, you can change different experiment settings in mint/configs/config_variables.py.
ENV_CONFIGS
This specifies the settings of the environment. Here is an example:
ENV_CONFIGS = [
...,
{
"max_steps": 5,
"use_tools": True,
"max_propose_solution": 2,
"count_down": True,
},
...
]
where max_steps corresponds to the budget of interaction (k) in the paper, use_tools should always be True (no tool setting is not implemented yet), max_propose_solution is the maximum number of solutions that the evaluated LLM can propose, and count_down is whether to count down the remaining steps in the environment (read Section 2 in the paper for more detail).
FEEDBACK_TYPES
This specifies the types of feedback we instruct the feedback-providing LLM to provide. Here are all the settings we currently support:
FEEDBACK_TYPES = [
{"pseudo_human_feedback": "no_GT", "feedback_form": "textual"}, # default setting
{"pseudo_human_feedback": "no_GT", "feedback_form": "binary"},
{"pseudo_human_feedback": "GT", "feedback_form": "binary"},
{"pseudo_human_feedback": "GT", "feedback_form": "textual"},
]
pseudo_human_feedbackspecifies whether we provide a ground-truth solution of the problem to the feedback-providing LLM.no_GTmeans we do not provide a ground-truth solution (default setting), andGTmeans we provide ground-truth feedback.feedback_formspecifies the form of feedback we provide.textualmeans we provide textual feedback (default setting), andbinarymeans we instruct the feedback-provider to provide binary feedback.