This topic explains the implementation details of rate limiting in the MCP Server.

When the runtime.middleware.rate_limiting.enabled property is set to true, the system adds a rate limiting middleware to enforce request quotas. An auditing layer logs each allow or deny decision under the category rate_limit.

The following table lists the main design principles:
Design point Description
Enforcement method Token bucket style enforcement supports requests per second (RPS) and optional burst capacity.
Per-client mode Uses a derived key hierarchy: client_idsession_idglobal fallback.
Global mode Uses a single shared bucket for all requests.
Fail-fast behavior Excess requests trigger an MCP error and the client receives a structured denial response.