Request Parameters
Complete reference for request parameters in the Chat Completions API.
Overview
The Chat Completions API provides a unified interface for interacting with multiple LLM providers. All requests require at minimum a model identifier and a messages array.
Base Endpoint: POST /chat/completions
Authentication: Bearer token in the Authorization header.
Required Parameters
| Parameter | Type | Description |
|---|---|---|
model |
string | The ID of the model to use (e.g., gpt-4o, gemini-1.5-pro, claude-3-5-sonnet-20241022) |
messages |
array | An array of message objects representing the conversation history |
Optional Parameters
| Parameter | Type | Default | Range | Description |
|---|---|---|---|---|
stream |
boolean | false |
- | Enable streaming responses via Server-Sent Events |
temperature |
number | 1.0 |
0.0 - 2.0 | Sampling temperature controlling randomness |
max_tokens |
integer | model-dependent | varies | Maximum number of tokens to generate |
top_p |
number | 1.0 |
0.0 - 1.0 | Nucleus sampling threshold |
frequency_penalty |
number | 0 |
-2.0 - 2.0 | Penalty for frequently used tokens |
presence_penalty |
number | 0 |
-2.0 - 2.0 | Penalty for introducing new topics |
stop |
string or array | null |
- | Sequences where generation stops |
Message Object Structure
Each message in the messages array must contain:
| Parameter | Type | Required | Description |
|---|---|---|---|
role |
string | Yes | The role of the message author: system, user, assistant, tool |
content |
string | Yes | The content of the message |
name |
string | No | Optional name of the participant |
tool_calls |
array | No | Tool calls generated by the model (for role: assistant) |
tool_call_id |
string | No | ID of the tool call (for role: tool) |
Parameter Details
model
Identifies which AI model to use for generating completions. Different models have different capabilities, costs, and performance characteristics.
Examples:
gpt-4o- OpenAI's GPT-4 Omnigemini-1.5-pro- Google's Gemini Proclaude-3-5-sonnet-20241022- Anthropic's Claude 3.5 Sonnet
messages
An ordered array of message objects representing the conversation. The first message is typically a system message setting behavior, followed by alternating user and assistant messages.
Structure:
[
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Hello! How are you?"
},
{
"role": "assistant",
"content": "I'm doing well, thank you!"
}
]
stream
When set to true, the API returns responses as Server-Sent Events (SSE), allowing you to display partial results as they're generated. When false, the complete response is returned in a single JSON object.
Streaming Response Format:
data: {"choices":[{"delta":{"content":"Hello"},"index":0}]}
data: {"choices":[{"delta":{"content":" world!"},"index":0}]}
data: [DONE]
temperature
Controls the randomness of the model's output. Higher values (closer to 2.0) make the output more random and creative, while lower values (closer to 0.0) make it more focused and deterministic.
Guidelines:
0.0 - 0.3: Highly deterministic, factual responses0.7 - 1.0: Balanced creativity and coherence (default)1.0 - 2.0: Highly creative, unpredictable outputs
max_tokens
Sets an upper limit on the number of tokens the model can generate in the response. The actual output may be shorter if the model naturally completes its response.
Note: Different models have different maximum token limits. Exceeding these limits will result in an error.
top_p
An alternative to temperature for controlling diversity. Only the top p probability mass of tokens is considered for sampling.
Recommendation: Use either temperature or top_p, but not both simultaneously.
frequency_penalty
Reduces the likelihood of the model repeating the same content. Positive values discourage repetition, while negative values encourage it.
Range: -2.0 to 2.0
presence_penalty
Encourages the model to talk about new topics. Higher positive values make the model more likely to introduce new concepts.
Range: -2.0 to 2.0
stop
Specifies sequences where the model should stop generating. This can be a single string or an array of up to 4 sequences.
Example:
{
"stop": ["\n", "END"]
}
Request Examples
Minimal Request
{
"model": "gpt-4o",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}
Request with Temperature Control
{
"model": "gpt-4o",
"messages": [
{
"role": "system",
"content": "You are a creative writer."
},
{
"role": "user",
"content": "Write a short story about a robot."
}
],
"temperature": 1.5,
"max_tokens": 500
}
Request with Streaming
{
"model": "gpt-4o",
"messages": [
{
"role": "user",
"content": "Explain quantum computing in simple terms."
}
],
"stream": true,
"temperature": 0.7
}
Request with Penalties
{
"model": "gpt-4o",
"messages": [
{
"role": "user",
"content": "Write a product description for a new smartphone."
}
],
"temperature": 0.8,
"frequency_penalty": 0.5,
"presence_penalty": 0.3,
"max_tokens": 300
}
Request with Stop Sequences
{
"model": "gpt-4o",
"messages": [
{
"role": "user",
"content": "List three programming languages."
}
],
"temperature": 0.5,
"stop": ["4.", "\n\n"]
}
Complex Multi-turn Conversation
{
"model": "gpt-4o",
"messages": [
{
"role": "system",
"content": "You are a technical support specialist. Be concise and helpful."
},
{
"role": "user",
"content": "My internet connection is slow. What should I do?"
},
{
"role": "assistant",
"content": "Try these steps:\n1. Restart your router\n2. Check your cables\n3. Test with another device"
},
{
"role": "user",
"content": "I tried those but it's still slow."
}
],
"temperature": 0.3,
"max_tokens": 200
}
Best Practices
Choosing Temperature
- Code generation: Use low temperature (0.0 - 0.3)
- Creative writing: Use higher temperature (0.8 - 1.5)
- General assistance: Use default (1.0)
Managing Token Limits
- Always set
max_tokensto control costs - Account for both input and output tokens in your pricing calculations
- Different models have different context windows
Optimizing for Quality
- Use
presence_penaltyto encourage variety in long conversations - Use
frequency_penaltyto avoid repetitive phrasing - Provide clear
systemmessages to set expectations
Streaming vs Non-Streaming
- Use streaming for real-time user interfaces
- Use non-streaming for batch processing and API integrations
Common Errors
| Error | Cause | Solution |
|---|---|---|
invalid_model |
Non-existent model ID | Verify model ID in the models list |
invalid_messages |
Empty or malformed messages array | Ensure at least one message with valid role and content |
invalid_temperature |
Temperature outside 0.0-2.0 range | Adjust temperature to valid range |
invalid_max_tokens |
Exceeds model's maximum | Reduce max_tokens value |
invalid_stop |
More than 4 stop sequences | Limit to 4 sequences maximum |