Skip to content

2. Chat Completion Streaming API

In this lesson, we will explore how to configure OpenAI to return responses as a continuous stream of data instead of a JSON object.

Curl Request Using a JSON File

To modify the curl request from the previous lesson, we will store the JSON payload in a file rather than including it directly in the command line. We will specify the full path to the JSON file as an argument in the curl command.

First, we create the JSON file at /home/tony/chatgpt-emacs/request.json with the following content:

{
  "model": "gpt-4o",
  "messages": [
    {
      "role": "developer",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "Hello!"
    }
  ]
}

To send the request using curl, we use the @ symbol to reference the file:

curl https://api.openai.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-proj-7pQDxN...w-D40A" \
  -d @/home/tony/chatgpt-emacs/request.json

Upon sending this request to the OpenAI API, we receive the following response:

{
  "id": "chatcmpl-B32ndGQSiexNKDdi5WKvHmYLZOihy",
  "object": "chat.completion",
  "created": 1740065445,
  "model": "gpt-4o-2024-08-06",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I assist you today?",
        "refusal": null
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 19,
    "completion_tokens": 10,
    "total_tokens": 29,
    "prompt_tokens_details": {
      "cached_tokens": 0,
      "audio_tokens": 0
    },
    "completion_tokens_details": {
      "reasoning_tokens": 0,
      "audio_tokens": 0,
      "accepted_prediction_tokens": 0,
      "rejected_prediction_tokens": 0
    }
  },
  "service_tier": "default",
  "system_fingerprint": "fp_fee4aaf18f"
}

Chat Completion Streaming API

In previous interactions with the OpenAI Chat Completion API, we received responses formatted as JSON objects. This is standard behavior. To modify this and enable streaming responses, we need to adjust the request payload.

Updating the Request for Streaming

To enable streaming, we include the stream parameter set to true in our JSON request like this:

{
  "model": "gpt-4o",
  "messages": [
    {
      "role": "developer",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "Hello!"
    }
  ],
  "stream": true
}

Observing Streaming Responses

Once the updated request is sent to OpenAI, we start receiving data in chunks. Here's a sample of the streamed output:

data: {"id":"chatcmpl-B33LUkTZbglsA7jDBpkMhDuQd6Mxp",
"object":"chat.completion.chunk","created":1740067544
,"model":"gpt-4o-2024-08-06","service_tier":"default"
,"system_fingerprint":"fp_fee4a af18f","choices":[{"i
ndex":0,"delta":{"role":"assistant","content":"","ref
usal":null},"logprobs ":null,"finish_reason":null}]}

data: {"id":"chatcmpl-B33LUkTZbglsA7jDBpkMhDuQd6Mxp",
"object":"chat.completion.chunk","created":1740067544
,"model":"gpt-4o-2024-08-06","service_tier":"default"
,"system_fingerprint":"fp_fee4aaf18f","choices":[{"in
dex":0,"delta":{"content":"Hello"},"logprobs":null,"f
inish_reason":null}]}

... [additional chunks omitted for brevity] ...

data: {"id":"chatcmpl-B33LUkTZbglsA7jDBpkMhDuQd6Mxp","
object":"chat.completion.chunk","created" :1740067544,
"model":"gpt-4o-2024-08-06","service_tier":"default","
system_fingerprint":"fp_fee4aaf18f","choices":[{"index
":0,"delta":{},"logprobs":null,"finish_reason":"stop"}
]}

data: [DONE]

Response Breakdown

In the streamed response, data is sent in chunks. Each chunk may contain parts of the message like so:

data: {"id":"chatcmpl-B33LUkTZbglsA7jDBpkMhDuQd6Mxp",
..."choices":[{"...,"delta":{"content":" How"}...}]}

data: {"id":"chatcmpl-B33LUkTZbglsA7jDBpkMhDuQd6Mxp",
..."choices":[{"...,"delta":{"content":" can"}...}]}

data: {"id":"chatcmpl-B33LUkTZbglsA7jDBpkMhDuQd6Mxp",
..."choices":[{"...,"delta":{"content":" I"}...}]}

data: {"id":"chatcmpl-B33LUkTZbglsA7jDBpkMhDuQd6Mxp",
..."choices":[{"...,"delta":{"content":" assist"}...}]}

Each delta field corresponds to a portion of the generated response, progressively building the output.