Generation Options
Advanced model and provider-specific options for fine-tuning generation behavior.
Overview
Beyond the standard completion options (temperature, maxTokens, etc.), multi-llm-ts supports provider-specific options that enable advanced features like reasoning modes, response continuity, and custom parameters.
Common Options
These options work across all or most providers:
maxTokens
Maximum number of tokens to generate:
const response = await model.complete(messages, {
maxTokens: 1000
})Supported: All providers
temperature
Randomness in generation (0-2):
const response = await model.complete(messages, {
temperature: 0.7 // More creative
})Supported: All providers (with model-specific constraints)
top_k
Top-K sampling for token selection:
const response = await model.complete(messages, {
top_k: 40
})Supported: Anthropic, Google, Ollama, Mistral AI
Note: For OpenAI, this enables logprobs and top_logprobs instead of affecting sampling.
top_p
Top-P (nucleus) sampling:
const response = await model.complete(messages, {
top_p: 0.9
})Supported: Most providers
OpenAI-Specific Options
Options for OpenAI models:
useResponsesApi
Enable the OpenAI Responses API for multi-turn conversations:
const response = await model.complete(messages, {
useResponsesApi: true
})
// Get the response ID
console.log(response.openAIResponseId)Purpose: Enables conversation continuity with response IDs for follow-up requests.
See: OpenAI Responses API documentation
responseId
Continue a previous conversation using its response ID:
const response = await model.complete(messages, {
useResponsesApi: true,
responseId: previousResponseId // Continue from previous response
})Requires: useResponsesApi: true
reasoningEffort
Control reasoning effort for o1-series models:
const response = await model.complete(messages, {
reasoningEffort: 'high' // 'low' | 'medium' | 'high'
})Supported models: o1-preview, o1-mini, o1, and DeepSeek reasoning models
Purpose: Balances speed vs. reasoning depth:
'low': Faster, less reasoning'medium': Balanced'high': Slower, deeper reasoning
verbosity
Control output verbosity for verbosity-supporting models:
const response = await model.complete(messages, {
verbosity: 'medium' // 'low' | 'medium' | 'high'
})Purpose: Controls how detailed the model's output is.
Anthropic-Specific Options
Options for Anthropic Claude models:
reasoning
Enable extended thinking mode for Claude:
const response = await model.complete(messages, {
reasoning: true,
reasoningBudget: 2048
})Purpose: Enables Claude's extended thinking capability on models that support it.
Default: Automatically enabled for models with capabilities.reasoning = true, set to false to disable.
reasoningBudget
Token budget for Claude's extended thinking:
const response = await model.complete(messages, {
reasoning: true,
reasoningBudget: 2048 // Tokens allocated for thinking
})Default: 1024 tokens
Purpose: Controls how many tokens Claude can use for internal reasoning before generating the response.
Effect:
- Higher budget: More thorough reasoning, potentially better quality
- Lower budget: Faster responses, less reasoning depth
Google-Specific Options
Options for Google Gemini models:
thinkingBudget
Token budget for Gemini's thinking process:
const response = await model.complete(messages, {
thinkingBudget: 1024
})Purpose: Similar to Anthropic's reasoningBudget, controls the token allocation for Gemini's internal reasoning on models with capabilities.reasoning = true.
Effect: Enables includeThoughts: true and sets the thinking budget.
Custom Provider Options
For provider-specific parameters not covered by the standard API:
customOpts
Pass arbitrary options directly to the provider:
// Ollama example
const response = await model.complete(messages, {
customOpts: {
num_ctx: 8192,
num_predict: 1024,
repeat_penalty: 1.1,
seed: 42
}
})Purpose: Allows access to any provider-specific parameter not exposed by the unified API.
Usage:
- Ollama: Maps to
optionsin the request - Other providers: Merged into the request payload
Example use cases:
- Ollama:
num_ctx,num_predict,repeat_penalty,seed, etc. - Custom API endpoints: Provider-specific experimental features
Complete Example
Combining multiple options:
import { igniteModel, loadModels, Message } from 'multi-llm-ts'
const models = await loadModels('anthropic', config)
const model = igniteModel('anthropic', models.chat[0], config)
const messages = [
new Message('user', 'Solve this complex problem: ...')
]
const response = await model.complete(messages, {
// Standard options
temperature: 1.0,
maxTokens: 4000,
top_p: 0.95,
top_k: 40,
// Anthropic-specific
reasoning: true,
reasoningBudget: 4096, // More thinking tokens
// Cancel if needed
abortSignal: controller.signal
})
console.log(response.content)Provider Support Matrix
| Option | OpenAI | Anthropic | Ollama | Others | |
|---|---|---|---|---|---|
maxTokens | ✅ | ✅ | ✅ | ✅ | ✅ |
temperature | ✅ | ✅ | ✅ | ✅ | ✅ |
top_k | ⚠️¹ | ✅ | ✅ | ✅ | ✅ |
top_p | ✅ | ✅ | ✅ | ✅ | ✅ |
contextWindowSize | ❌ | ❌ | ❌ | ✅ | ❌ |
useResponsesApi | ✅ | ❌ | ❌ | ❌ | ❌ |
responseId | ✅ | ❌ | ❌ | ❌ | ❌ |
reasoningEffort | ✅² | ❌ | ❌ | ❌ | ✅³ |
verbosity | ✅² | ❌ | ❌ | ❌ | ❌ |
reasoning | ❌ | ✅ | ❌ | ❌ | ❌ |
reasoningBudget | ❌ | ✅ | ❌ | ❌ | ❌ |
thinkingBudget | ❌ | ❌ | ✅ | ❌ | ❌ |
customOpts | ✅ | ✅ | ✅ | ✅ | ✅ |
¹ OpenAI: Enables logprobs instead of affecting sampling ² OpenAI: o1-series models only ³ DeepSeek reasoning models support reasoningEffort
Next Steps
- Learn about Structured Output for JSON generation
- Review Abort Operations for cancellation
- See Types for complete type definitions