🤖Supported configs & options

April 3, 2025 · View on GitHub

Symbols: ✅ - Supported, ❌ - Not supported, 📌 - Plan to support

OpenAI ✅

API configurations

Field	Description
API Key	The API key for your OpenAI API.
Model	ID of the model to use.

Conversation options

Option	Description	Supported
frequency_penalty	Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.	✅
max_tokens	The maximum number of tokens that can be generated in the chat completion. The total length of input tokens and generated tokens is limited by the model's context length.	✅
presence_penalty	Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.	✅
temperature	What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.	✅
top_p	An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.	✅
stream	If set, partial message deltas will be sent, like in ChatGPT.	✅
user	A unique identifier representing your end-user, which can help OpenAI to monitor and detect abuse.	✅
response_format	An object specifying the format that the model must output. Compatible with GPT-4 Turbo and all GPT-3.5 Turbo models newer than gpt-3.5-turbo-1106.	📌
seed	If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.	📌
stop	Up to 4 sequences where the API will stop generating further tokens.	📌
tools	A list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide a list of functions the model may generate JSON inputs for.	❌
tool_choice	Controls which (if any) function is called by the model. none means the model will not call a function and instead generates a message. auto means the model can pick between generating a message or calling a function. Specifying a particular function via {"type": "function", "function": {"name": "my_function"}} forces the model to call that function. none is the default when no functions are present. auto is the default if functions are present.	❌
logit_bias	Modify the likelihood of specified tokens appearing in the completion. Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token.	❌
logprobs	Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the content of message. This option is currently not available on the gpt-4-vision-preview model.	❌
top_logprobs	An integer between 0 and 5 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to true if this parameter is used.	❌
n	How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.	❌

References

OpenAI Documentation

Microsoft Azure ✅

API configurations

Field	Description
API Key	The API key for your Azure OpenAI API.
Endpoint	The endpoint for your Azure OpenAI API.
API version	The API version to use for this operation. This follows the YYYY-MM-DD or YYYY-MM-DD-preview format.
Deployment ID	The name of your model deployment.

Conversation options

Option	Description	Supported
max_tokens	The maximum number of tokens to generate in the completion. The token count of your prompt plus max_tokens can't exceed the model's context length.	✅
temperature	What sampling temperature to use, between 0 and 2. Higher values mean the model takes more risks. Try 0.9 for more creative applications, and 0 (argmax sampling) for ones with a well-defined answer. We generally recommend altering this or top_p but not both.	✅
top_p	An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.	✅
presence_penalty	Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.	✅
frequency_penalty	Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.	✅
stream	If set, partial message deltas will be sent, like in ChatGPT.	✅
user	A unique identifier representing your end-user, which can help OpenAI to monitor and detect abuse.	✅
suffix	The suffix that comes after a completion of inserted text.	📌
echo	Echo back the prompt in addition to the completion. This parameter cannot be used with gpt-35-turbo.	📌
stop	Up to four sequences where the API will stop generating further tokens. The returned text won't contain the stop sequence. For GPT-4 Turbo with Vision, up to two sequences are supported.	📌
logit_bias	Modify the likelihood of specified tokens appearing in the completion. Accepts a json object that maps tokens (specified by their token ID in the GPT tokenizer) to an associated bias value from -100 to 100. You can use this tokenizer tool (which works for both GPT-2 and GPT-3) to convert text to token IDs. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect varies per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token. As an example, you can pass {"50256": -100} to prevent the <\|endoftext\|> token from being generated.	❌
n	How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.	❌
logprobs	Include the log probabilities on the logprobs most likely tokens, as well the chosen tokens. For example, if logprobs is 10, the API will return a list of the 10 most likely tokens. The API will always return the logprob of the sampled token, so there might be up to logprobs+1 elements in the response. This parameter cannot be used with gpt-35-turbo.	❌
best_of	Generates best_of completions server-side and returns the "best" (the one with the lowest log probability per token). Results can't be streamed. When used with n, best_of controls the number of candidate completions and n specifies how many to return – best_of must be greater than n. Note: Because this parameter generates many completions, it can quickly consume your token quota. Use carefully and ensure that you have reasonable settings for max_tokens and stop. This parameter cannot be used with gpt-35-turbo.	❌

References

Azure Documentation

Anthropic Claude ✅

API configurations

Field	Description
api-key	The API key for your Anthropic API.
anthropic-version	The version of Anthropic to use.
model	The Anthropic model to use.

Conversation options

Option	Description	Supported
max_tokens	The maximum number of tokens to generate before stopping.	✅
temperature	Amount of randomness injected into the response. Defaults to 1.0. Ranges from 0.0 to 1.0. Use temperature closer to 0.0 for analytical / multiple choice, and closer to 1.0 for creative and generative tasks. We generally recommend altering this or top_p but not both.	✅
top_p	Use nucleus sampling. Recommended for advanced use cases only. You usually only need to use temperature.	✅
stream	Whether to incrementally stream the response using server-sent events.	✅
user	An object describing metadata about the request. metadata.user_id: An external identifier for the user who is associated with the request.	✅
stop_sequences	Custom text sequences that will cause the model to stop generating.	📌
top_k	Only sample from the top K options for each subsequent token. Recommended for advanced use cases only. You usually only need to use temperature.	📌
tools	Definitions of tools that the model may use.	❌
tool_choice	How the model should use the provided tools.	❌

Ollama ✅

API configurations

Field	Description
Endpoint	The endpoint for your Azure OpenAI API.
Model	The model to use.

Conversation options

Option	Description	Supported
num_ctx	Number of input tokens. Sets the size of the context window used to generate the next token. (Default: 2048)	✅
num-predict	Number of output tokens. Maximum number of tokens to predict when generating text. (Default: 128, -1 = infinite generation, -2 = fill context)	✅
temperature	The temperature of the model. Increasing the temperature will make the model answer more creatively. (Default: 0.8)	✅
top_p	Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text. (Default: 0.9)	✅
mirostat	Enable Mirostat sampling for controlling perplexity. (default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0)	📌
mirostat_eta	Influences how quickly the algorithm responds to feedback from the generated text. A lower learning rate will result in slower adjustments, while a higher learning rate will make the algorithm more responsive. (Default: 0.1)	📌
mirostat_tau	Controls the balance between coherence and diversity of the output. A lower value will result in more focused and coherent text. (Default: 5.0)	📌
repeat_last_n	Sets how far back for the model to look back to prevent repetition. (Default: 64, 0 = disabled, -1 = num_ctx)	📌
repeat_penalty	Sets how strongly to penalize repetitions. A higher value (e.g., 1.5) will penalize repetitions more strongly, while a lower value (e.g., 0.9) will be more lenient. (Default: 1.1)	📌
seed	Sets the random number seed to use for generation. Setting this to a specific number will make the model generate the same text for the same prompt. (Default: 0)	📌
stop	Sets the stop sequences to use. When this pattern is encountered the LLM will stop generating text and return. Multiple stop patterns may be set by specifying multiple separate stop parameters in a modelfile.	📌
tfs_z	Tail free sampling is used to reduce the impact of less probable tokens from the output. A higher value (e.g., 2.0) will reduce the impact more, while a value of 1.0 disables this setting. (default: 1)	📌
top_k	Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative. (Default: 40)	📌
min_p	Alternative to the top_p, and aims to ensure a balance of quality and variety. The parameter p represents the minimum probability for a token to be considered, relative to the probability of the most likely token. For example, with p=0.05 and the most likely token having a probability of 0.9, logits with a value less than 0.045 are filtered out. (Default: 0.0)	📌

References

Ollama Modelfile

Google Gemini

📌 Plan to support