MegaLLM Logo
MegaLLM
OpenAI API

Chat Completions

Generate conversational responses using the OpenAI chat completions API

Create chat completion

Creates a chat completion response with OpenAI compatibility. Supports streaming, tool calling, and structured output.

POST
/chat/completions
AuthorizationBearer <token>

API key

In: header

modelstring

Model ID to use for completion

messagesarray<ChatMessage>

List of messages in the conversation

max_tokens?integer

Maximum number of tokens to generate

Range1 <= value
temperature?number

Sampling temperature

Range0 <= value <= 2
top_p?number

Nucleus sampling parameter

Range0 <= value <= 1
stream?boolean

Whether to stream back partial progress

tools?array<Tool>

List of tools available to the model

tool_choice?string | object

Controls which tools are called

response_format?object

Format for the response

Response Body

curl -X POST "http://localhost:4141/chat/completions" \  -H "Content-Type: application/json" \  -d '{    "model": "gpt-4",    "messages": [      {        "role": "user",        "content": "Hello, how are you?"      }    ],    "max_tokens": 100  }'
{
  "id": "string",
  "object": "chat.completion",
  "created": 0,
  "model": "string",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "system",
        "content": "string",
        "name": "string",
        "tool_calls": [
          {
            "id": "string",
            "type": "function",
            "function": {
              "name": "string",
              "arguments": "string"
            }
          }
        ],
        "tool_call_id": "string"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 0,
    "completion_tokens": 0,
    "total_tokens": 0
  }
}

{
  "error": {
    "message": "Invalid request parameter",
    "type": "invalid_request_error",
    "param": "model"
  }
}

{
  "error": {
    "message": "Invalid authentication credentials",
    "type": "authentication_error"
  }
}

{
  "error": {
    "message": "Rate limit exceeded",
    "type": "rate_limit_error"
  }
}

{
  "error": {
    "message": "Internal server error",
    "type": "internal_error"
  }
}

Overview

The Chat Completions API enables you to build conversational experiences using GPT models. Send a list of messages and receive an AI-generated response.

Basic Usage

from openai import OpenAI

client = OpenAI(
    base_url="https://ai.megallm.io/v1",
    api_key="your-api-key"
)

response = client.chat.completions.create(
    model="gpt-4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What's the weather like?"}
    ],
    temperature=0.7,
    max_tokens=150
)

print(response.choices[0].message.content)
import OpenAI from 'openai';

const openai = new OpenAI({
  baseURL: 'https://ai.megallm.io/v1',
  apiKey: process.env.MEGALLM_API_KEY,
});

const response = await openai.chat.completions.create({
  model: 'gpt-4',
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: "What's the weather like?" }
  ],
  temperature: 0.7,
  max_tokens: 150
});

console.log(response.choices[0].message.content);
curl https://ai.megallm.io/v1/chat/completions \
  -H "Authorization: Bearer $MEGALLM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What'\''s the weather like?"}
    ],
    "temperature": 0.7,
    "max_tokens": 150
  }'
package main

import (
    "context"
    openai "github.com/sashabaranov/go-openai"
)

func main() {
    config := openai.DefaultConfig("your-api-key")
    config.BaseURL = "https://ai.megallm.io/v1"
    client := openai.NewClientWithConfig(config)

    resp, err := client.CreateChatCompletion(
        context.Background(),
        openai.ChatCompletionRequest{
            Model: "gpt-4",
            Messages: []openai.ChatCompletionMessage{
                {
                    Role:    "system",
                    Content: "You are a helpful assistant.",
                },
                {
                    Role:    "user",
                    Content: "What's the weather like?",
                },
            },
            Temperature: 0.7,
            MaxTokens:   150,
        },
    )

    if err != nil {
        panic(err)
    }

    println(resp.Choices[0].Message.Content)
}

Advanced Features

Message Roles

The API supports different message roles for conversation context:

RoleDescriptionExample
systemSets behavior and context"You are a helpful assistant"
userUser input/questions"What's the capital of France?"
assistantAI responses"The capital of France is Paris"
toolTool/function resultsFunction execution results

Temperature Control

Adjust response creativity with the temperature parameter:

# More deterministic (0.0 - 0.3)
response = client.chat.completions.create(
    model="gpt-4",
    messages=[...],
    temperature=0.2  # More focused, consistent responses
)

# Balanced (0.4 - 0.7)
response = client.chat.completions.create(
    model="gpt-4",
    messages=[...],
    temperature=0.5  # Balanced creativity and coherence
)

# More creative (0.8 - 1.0)
response = client.chat.completions.create(
    model="gpt-4",
    messages=[...],
    temperature=0.9  # More varied, creative responses
)

Multi-turn Conversations

Maintain context across multiple exchanges:

messages = [
    {"role": "system", "content": "You are a math tutor."},
    {"role": "user", "content": "What is 15 * 12?"},
    {"role": "assistant", "content": "15 * 12 = 180"},
    {"role": "user", "content": "Now divide that by 6"}
]

response = client.chat.completions.create(
    model="gpt-4",
    messages=messages
)
# Response understands "that" refers to 180

Vision Support

Process images alongside text (requires vision-capable models):

response = client.chat.completions.create(
    model="gpt-4-vision",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://example.com/image.jpg",
                        "detail": "high"  # "low", "high", or "auto"
                    }
                }
            ]
        }
    ],
    max_tokens=300
)

Response Format

Standard Response

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1677858242,
  "model": "gpt-4",
  "usage": {
    "prompt_tokens": 13,
    "completion_tokens": 17,
    "total_tokens": 30
  },
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop",
      "index": 0
    }
  ]
}

Finish Reasons

ReasonDescription
stopNatural completion
lengthHit max_tokens limit
tool_callsModel wants to call a function
content_filterContent was filtered

Best Practices

System Messages: Always include a clear system message to set the AI's behavior and context.

Token Optimization

# Count tokens before sending
import tiktoken

encoding = tiktoken.encoding_for_model("gpt-4")
tokens = encoding.encode("Your message here")
print(f"Token count: {len(tokens)}")

# Truncate if needed
if len(tokens) > 1000:
    truncated = encoding.decode(tokens[:1000])

Error Handling

try:
    response = client.chat.completions.create(...)
except openai.APIError as e:
    print(f"API error: {e}")
except openai.RateLimitError as e:
    print(f"Rate limit hit: {e}")
    # Implement exponential backoff
except openai.APIConnectionError as e:
    print(f"Connection error: {e}")
    # Retry with backoff

Common Patterns

Summarization

response = client.chat.completions.create(
    model="gpt-4",
    messages=[
        {"role": "system", "content": "You are a summarization expert."},
        {"role": "user", "content": f"Summarize this text in 3 bullet points: {long_text}"}
    ],
    temperature=0.3,
    max_tokens=150
)

Classification

response = client.chat.completions.create(
    model="gpt-4",
    messages=[
        {"role": "system", "content": "Classify the sentiment as positive, negative, or neutral."},
        {"role": "user", "content": "I love this product! It works great."}
    ],
    temperature=0.0,
    max_tokens=10
)

Code Generation

response = client.chat.completions.create(
    model="gpt-4",
    messages=[
        {"role": "system", "content": "You are a Python expert."},
        {"role": "user", "content": "Write a function to calculate fibonacci numbers"}
    ],
    temperature=0.2,
    max_tokens=500
)

Rate Limiting

Implement proper rate limiting to avoid errors:

import time
from typing import Optional

class RateLimiter:
    def __init__(self, requests_per_minute: int = 60):
        self.requests_per_minute = requests_per_minute
        self.request_times = []

    def wait_if_needed(self):
        now = time.time()
        # Remove requests older than 1 minute
        self.request_times = [t for t in self.request_times if now - t < 60]

        if len(self.request_times) >= self.requests_per_minute:
            sleep_time = 60 - (now - self.request_times[0])
            if sleep_time > 0:
                time.sleep(sleep_time)

        self.request_times.append(now)

# Usage
limiter = RateLimiter(60)

for prompt in prompts:
    limiter.wait_if_needed()
    response = client.chat.completions.create(...)

Next Steps