The response guard feature controls the size and structure of responses returned by downstream services. Response guard settings help prevent excessive memory usage and protect LLM pipelines from runaway payloads.

The response guard operates in different modes that determine how oversized responses are handled. The following table summarizes the available modes and their behavior:
Mode Behavior
trim Truncates arrays to max_array_items and shortens large responses. This is the default mode.
block Rejects responses that exceed configured limits and returns an error.
warn Allows responses that exceed limits but logs a warning for visibility.
Response guard enforces limits on array size and response payload size. These limits prevent uncontrolled growth in memory and context.
Limit Default value Purpose
max_array_items 50 Restricts the maximum number of items in an array.
max_response_bytes 1 MiB Restricts the maximum size of a response payload.