Skip to content

128k whitespace tokens, by accident: debugging the Responses API

tldr: I made setting max_output_tokens a hard rule for the Responses API. While generating some structured output, I hit a bug that generated 128k tokens. Removing the array length constraint fixed it!

Always set the max_output_tokens request parameter in the OpenAI Responses API. This is my new rule.

While testing phrasebook-fr-to-en CLI, for some reason the API generated 128k tokens, the maximum. I expected a JSON like this:

{
  "translations": [
    {
      "french": "fr2",
      "english": "en2"
    },
    {
      "french": "fr3",
      "english": "en3"
    }
  ]
}

I used their structured output feature and defined the expected JSON schema with this Pydantic model:

# DON'T USE THIS CODE.  IT MAKES THE REPSONSES API BUG.
from openai import OpenAI
from pydantic import BaseModel, conlist

client = OpenAI()

class Translation(BaseModel):
    french: str
    english: str

class Translations(BaseModel):
    # BUG CAUSED BY THIS FOLLOWING LINE
    translations: conlist(Translation, min_length=2, max_length=2)

response = client.responses.parse(
    model="gpt-5.2",
    input="Il est beau. -> He is handsome.",
    text_format=Translations,
)

But what I got was an incomplete JSON with more than 63,000 whitespaces. In total, it hit exactly 128k tokens. If I had set the max_output_tokens request parameter to 256, it would have stopped at 256.

img

The bug seems to kick in because of the minItems and maxItems array fields in the JSON schema. I set both to 2 with conlist(Translation, min_length=2, max_length=2) in the Pydantic model. Per the docs this should have worked:

img

And it did, once I removed the length condition on the array, the API responded the right way:

class Translations(BaseModel):
    translations: list[Translation]

Lesson learned:

Always set the max_output_tokens request parameter in the OpenAI Responses API.

That's all I have for today! Talk soonÂ đŸ‘‹

References