πŸ€–Supported configs & options

April 3, 2025 Β· View on GitHub

en-icon zh-hans-icon fr-icon

Symbols: βœ… - Supported, ❌ - Not supported, πŸ“Œ - Plan to support

OpenAI βœ…

API configurations

FieldDescription
API KeyThe API key for your OpenAI API.
ModelID of the model to use.

Conversation options

OptionDescriptionSupported
frequency_penaltyNumber between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.βœ…
max_tokensThe maximum number of tokens that can be generated in the chat completion.
The total length of input tokens and generated tokens is limited by the model's context length.
βœ…
presence_penaltyNumber between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.βœ…
temperatureWhat sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
We generally recommend altering this or top_p but not both.
βœ…
top_pAn alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.
We generally recommend altering this or temperature but not both.
βœ…
streamIf set, partial message deltas will be sent, like in ChatGPT.βœ…
userA unique identifier representing your end-user, which can help OpenAI to monitor and detect abuse.βœ…
response_formatAn object specifying the format that the model must output. Compatible with GPT-4 Turbo and all GPT-3.5 Turbo models newer than gpt-3.5-turbo-1106.πŸ“Œ
seedIf specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.πŸ“Œ
stopUp to 4 sequences where the API will stop generating further tokens.πŸ“Œ
toolsA list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide a list of functions the model may generate JSON inputs for.❌
tool_choiceControls which (if any) function is called by the model. none means the model will not call a function and instead generates a message. auto means the model can pick between generating a message or calling a function. Specifying a particular function via {"type": "function", "function": {"name": "my_function"}} forces the model to call that function.
none is the default when no functions are present. auto is the default if functions are present.
❌
logit_biasModify the likelihood of specified tokens appearing in the completion.
Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token.
❌
logprobsWhether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the content of message. This option is currently not available on the gpt-4-vision-preview model.❌
top_logprobsAn integer between 0 and 5 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to true if this parameter is used.❌
nHow many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.❌

References

Microsoft Azure βœ…

API configurations

FieldDescription
API KeyThe API key for your Azure OpenAI API.
EndpointThe endpoint for your Azure OpenAI API.
API versionThe API version to use for this operation. This follows the YYYY-MM-DD or YYYY-MM-DD-preview format.
Deployment IDThe name of your model deployment.

Conversation options

OptionDescriptionSupported
max_tokensThe maximum number of tokens to generate in the completion. The token count of your prompt plus max_tokens can't exceed the model's context length.βœ…
temperatureWhat sampling temperature to use, between 0 and 2. Higher values mean the model takes more risks. Try 0.9 for more creative applications, and 0 (argmax sampling) for ones with a well-defined answer. We generally recommend altering this or top_p but not both.βœ…
top_pAn alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.βœ…
presence_penaltyNumber between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.βœ…
frequency_penaltyNumber between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.βœ…
streamIf set, partial message deltas will be sent, like in ChatGPT.βœ…
userA unique identifier representing your end-user, which can help OpenAI to monitor and detect abuse.βœ…
suffixThe suffix that comes after a completion of inserted text.πŸ“Œ
echoEcho back the prompt in addition to the completion. This parameter cannot be used with gpt-35-turbo.πŸ“Œ
stopUp to four sequences where the API will stop generating further tokens. The returned text won't contain the stop sequence. For GPT-4 Turbo with Vision, up to two sequences are supported.πŸ“Œ
logit_biasModify the likelihood of specified tokens appearing in the completion. Accepts a json object that maps tokens (specified by their token ID in the GPT tokenizer) to an associated bias value from -100 to 100. You can use this tokenizer tool (which works for both GPT-2 and GPT-3) to convert text to token IDs. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect varies per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token. As an example, you can pass {"50256": -100} to prevent the <|endoftext|> token from being generated.❌
nHow many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.❌
logprobsInclude the log probabilities on the logprobs most likely tokens, as well the chosen tokens. For example, if logprobs is 10, the API will return a list of the 10 most likely tokens. The API will always return the logprob of the sampled token, so there might be up to logprobs+1 elements in the response. This parameter cannot be used with gpt-35-turbo.❌
best_ofGenerates best_of completions server-side and returns the "best" (the one with the lowest log probability per token). Results can't be streamed. When used with n, best_of controls the number of candidate completions and n specifies how many to return – best_of must be greater than n. Note: Because this parameter generates many completions, it can quickly consume your token quota. Use carefully and ensure that you have reasonable settings for max_tokens and stop. This parameter cannot be used with gpt-35-turbo.❌

References

Anthropic Claude βœ…

API configurations

FieldDescription
api-keyThe API key for your Anthropic API.
anthropic-versionThe version of Anthropic to use.
modelThe Anthropic model to use.

Conversation options

OptionDescriptionSupported
max_tokensThe maximum number of tokens to generate before stopping.βœ…
temperatureAmount of randomness injected into the response.
Defaults to 1.0. Ranges from 0.0 to 1.0. Use temperature closer to 0.0 for analytical / multiple choice, and closer to 1.0 for creative and generative tasks.
We generally recommend altering this or top_p but not both.
βœ…
top_pUse nucleus sampling.
Recommended for advanced use cases only. You usually only need to use temperature.
βœ…
streamWhether to incrementally stream the response using server-sent events.βœ…
userAn object describing metadata about the request.
metadata.user_id: An external identifier for the user who is associated with the request.
βœ…
stop_sequencesCustom text sequences that will cause the model to stop generating.πŸ“Œ
top_kOnly sample from the top K options for each subsequent token.
Recommended for advanced use cases only. You usually only need to use temperature.
πŸ“Œ
toolsDefinitions of tools that the model may use.❌
tool_choiceHow the model should use the provided tools.❌

Ollama βœ…

API configurations

FieldDescription
EndpointThe endpoint for your Azure OpenAI API.
ModelThe model to use.

Conversation options

OptionDescriptionSupported
num_ctxNumber of input tokens. Sets the size of the context window used to generate the next token. (Default: 2048)βœ…
num-predictNumber of output tokens. Maximum number of tokens to predict when generating text. (Default: 128, -1 = infinite generation, -2 = fill context)βœ…
temperatureThe temperature of the model. Increasing the temperature will make the model answer more creatively. (Default: 0.8)βœ…
top_pWorks together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text. (Default: 0.9)βœ…
mirostatEnable Mirostat sampling for controlling perplexity. (default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0)πŸ“Œ
mirostat_etaInfluences how quickly the algorithm responds to feedback from the generated text. A lower learning rate will result in slower adjustments, while a higher learning rate will make the algorithm more responsive. (Default: 0.1)πŸ“Œ
mirostat_tauControls the balance between coherence and diversity of the output. A lower value will result in more focused and coherent text. (Default: 5.0)πŸ“Œ
repeat_last_nSets how far back for the model to look back to prevent repetition. (Default: 64, 0 = disabled, -1 = num_ctx)πŸ“Œ
repeat_penaltySets how strongly to penalize repetitions. A higher value (e.g., 1.5) will penalize repetitions more strongly, while a lower value (e.g., 0.9) will be more lenient. (Default: 1.1)πŸ“Œ
seedSets the random number seed to use for generation. Setting this to a specific number will make the model generate the same text for the same prompt. (Default: 0)πŸ“Œ
stopSets the stop sequences to use. When this pattern is encountered the LLM will stop generating text and return. Multiple stop patterns may be set by specifying multiple separate stop parameters in a modelfile.πŸ“Œ
tfs_zTail free sampling is used to reduce the impact of less probable tokens from the output. A higher value (e.g., 2.0) will reduce the impact more, while a value of 1.0 disables this setting. (default: 1)πŸ“Œ
top_kReduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative. (Default: 40)πŸ“Œ
min_pAlternative to the top_p, and aims to ensure a balance of quality and variety. The parameter p represents the minimum probability for a token to be considered, relative to the probability of the most likely token. For example, with p=0.05 and the most likely token having a probability of 0.9, logits with a value less than 0.045 are filtered out. (Default: 0.0)πŸ“Œ

References

Google Gemini

πŸ“Œ Plan to support