STATIC

Request Parameters

Complete reference for request parameters in the Chat Completions API.

Overview

The Chat Completions API provides a unified interface for interacting with multiple LLM providers. All requests require at minimum a model identifier and a messages array.

Base Endpoint: POST /chat/completions

Authentication: Bearer token in the Authorization header.

Required Parameters

Parameter Type Description
model string The ID of the model to use (e.g., gpt-4o, gemini-1.5-pro, claude-3-5-sonnet-20241022)
messages array An array of message objects representing the conversation history

Optional Parameters

Parameter Type Default Range Description
stream boolean false - Enable streaming responses via Server-Sent Events
temperature number 1.0 0.0 - 2.0 Sampling temperature controlling randomness
max_tokens integer model-dependent varies Maximum number of tokens to generate
top_p number 1.0 0.0 - 1.0 Nucleus sampling threshold
frequency_penalty number 0 -2.0 - 2.0 Penalty for frequently used tokens
presence_penalty number 0 -2.0 - 2.0 Penalty for introducing new topics
stop string or array null - Sequences where generation stops

Message Object Structure

Each message in the messages array must contain:

Parameter Type Required Description
role string Yes The role of the message author: system, user, assistant, tool
content string Yes The content of the message
name string No Optional name of the participant
tool_calls array No Tool calls generated by the model (for role: assistant)
tool_call_id string No ID of the tool call (for role: tool)

Parameter Details

model

Identifies which AI model to use for generating completions. Different models have different capabilities, costs, and performance characteristics.

Examples:

  • gpt-4o - OpenAI's GPT-4 Omni
  • gemini-1.5-pro - Google's Gemini Pro
  • claude-3-5-sonnet-20241022 - Anthropic's Claude 3.5 Sonnet

messages

An ordered array of message objects representing the conversation. The first message is typically a system message setting behavior, followed by alternating user and assistant messages.

Structure:

[
  {
    "role": "system",
    "content": "You are a helpful assistant."
  },
  {
    "role": "user",
    "content": "Hello! How are you?"
  },
  {
    "role": "assistant",
    "content": "I'm doing well, thank you!"
  }
]

stream

When set to true, the API returns responses as Server-Sent Events (SSE), allowing you to display partial results as they're generated. When false, the complete response is returned in a single JSON object.

Streaming Response Format:

data: {"choices":[{"delta":{"content":"Hello"},"index":0}]}

data: {"choices":[{"delta":{"content":" world!"},"index":0}]}

data: [DONE]

temperature

Controls the randomness of the model's output. Higher values (closer to 2.0) make the output more random and creative, while lower values (closer to 0.0) make it more focused and deterministic.

Guidelines:

  • 0.0 - 0.3: Highly deterministic, factual responses
  • 0.7 - 1.0: Balanced creativity and coherence (default)
  • 1.0 - 2.0: Highly creative, unpredictable outputs

max_tokens

Sets an upper limit on the number of tokens the model can generate in the response. The actual output may be shorter if the model naturally completes its response.

Note: Different models have different maximum token limits. Exceeding these limits will result in an error.

top_p

An alternative to temperature for controlling diversity. Only the top p probability mass of tokens is considered for sampling.

Recommendation: Use either temperature or top_p, but not both simultaneously.

frequency_penalty

Reduces the likelihood of the model repeating the same content. Positive values discourage repetition, while negative values encourage it.

Range: -2.0 to 2.0

presence_penalty

Encourages the model to talk about new topics. Higher positive values make the model more likely to introduce new concepts.

Range: -2.0 to 2.0

stop

Specifies sequences where the model should stop generating. This can be a single string or an array of up to 4 sequences.

Example:

{
  "stop": ["\n", "END"]
}

Request Examples

Minimal Request

{
  "model": "gpt-4o",
  "messages": [
    {
      "role": "user",
      "content": "What is the capital of France?"
    }
  ]
}

Request with Temperature Control

{
  "model": "gpt-4o",
  "messages": [
    {
      "role": "system",
      "content": "You are a creative writer."
    },
    {
      "role": "user",
      "content": "Write a short story about a robot."
    }
  ],
  "temperature": 1.5,
  "max_tokens": 500
}

Request with Streaming

{
  "model": "gpt-4o",
  "messages": [
    {
      "role": "user",
      "content": "Explain quantum computing in simple terms."
    }
  ],
  "stream": true,
  "temperature": 0.7
}

Request with Penalties

{
  "model": "gpt-4o",
  "messages": [
    {
      "role": "user",
      "content": "Write a product description for a new smartphone."
    }
  ],
  "temperature": 0.8,
  "frequency_penalty": 0.5,
  "presence_penalty": 0.3,
  "max_tokens": 300
}

Request with Stop Sequences

{
  "model": "gpt-4o",
  "messages": [
    {
      "role": "user",
      "content": "List three programming languages."
    }
  ],
  "temperature": 0.5,
  "stop": ["4.", "\n\n"]
}

Complex Multi-turn Conversation

{
  "model": "gpt-4o",
  "messages": [
    {
      "role": "system",
      "content": "You are a technical support specialist. Be concise and helpful."
    },
    {
      "role": "user",
      "content": "My internet connection is slow. What should I do?"
    },
    {
      "role": "assistant",
      "content": "Try these steps:\n1. Restart your router\n2. Check your cables\n3. Test with another device"
    },
    {
      "role": "user",
      "content": "I tried those but it's still slow."
    }
  ],
  "temperature": 0.3,
  "max_tokens": 200
}

Best Practices

Choosing Temperature

  • Code generation: Use low temperature (0.0 - 0.3)
  • Creative writing: Use higher temperature (0.8 - 1.5)
  • General assistance: Use default (1.0)

Managing Token Limits

  • Always set max_tokens to control costs
  • Account for both input and output tokens in your pricing calculations
  • Different models have different context windows

Optimizing for Quality

  • Use presence_penalty to encourage variety in long conversations
  • Use frequency_penalty to avoid repetitive phrasing
  • Provide clear system messages to set expectations

Streaming vs Non-Streaming

  • Use streaming for real-time user interfaces
  • Use non-streaming for batch processing and API integrations

Common Errors

Error Cause Solution
invalid_model Non-existent model ID Verify model ID in the models list
invalid_messages Empty or malformed messages array Ensure at least one message with valid role and content
invalid_temperature Temperature outside 0.0-2.0 range Adjust temperature to valid range
invalid_max_tokens Exceeds model's maximum Reduce max_tokens value
invalid_stop More than 4 stop sequences Limit to 4 sequences maximum

Rejoining the server...

Rejoin failed... trying again in seconds.

Failed to rejoin.
Please retry or reload the page.

The session has been paused by the server.

Failed to resume the session.
Please reload the page.