Capacity and performance
- Last Updated: December 23, 2025
- 2 minute read
- OpenEdge
- Version 12.8
- Documentation
This section explains how to plan for system capacity and optimize performance to ensure reliable operations under varying workloads. It covers vertical scaling limits, horizontal scaling triggers, and strategies for resource management. Understanding these guidelines helps maintain system stability and support scalability for high availability.
Response and resource protection
The response guard is a critical mechanism that prevents excessive memory usage caused by large payloads. It ensures that the system remains responsive and avoids crashes during high-volume operations. When tools consistently approach guard limits, it is a signal that the underlying API needs a redesign to use pagination for better efficiency. Latency hotspots often occur at token exchange endpoints, during downstream TLS handshakes, and when retrieving JWKS keys at cache refresh intervals.
Scaling architecture and performance tuning
- The server uses single-threaded asynchronous I/O within each worker process.
- It employs an event-loop-based concurrency model to handle thousands of simultaneous connections.
- Non-blocking operations are used for HTTP requests and downstream API calls, which improves responsiveness.
- This architecture is highly efficient for I/O-bound workloads, which are typical in MCP server operations.
Production scaling strategy
You can configure worker processes for different environments to achieve optimal performance and reliability. Worker configuration depends on workload characteristics such as I/O intensity, CPU requirements, and high-availability needs.
| Environment | Configuration | Purpose |
|---|---|---|
| Development |
|
Use a single worker for simplicity during development |
| Production (I/O-heavy) |
|
Recommended for workloads dominated by network I/O |
| Production (CPU-intensive) |
|
Match the number of workers to available CPU cores |
| High-availability |
|
Over-provision workers for redundancy and failover monitoring |
- In development, a single worker is sufficient because the workload is minimal and simplicity is preferred.
- For I/O-heavy production environments, two workers are recommended to handle concurrent requests efficiently without overloading the system.
- In CPU-intensive production environments, set the number of workers equal to the number of CPU cores to maximize parallel processing.
- For high-availability scenarios, double the number of workers compared to CPU cores. This approach provides redundancy and supports failover, but it requires active monitoring to avoid resource contention.
Capacity planning
You must plan for system capacity to ensure optimal performance and scalability. System capacity is the limit of vertical scaling and it identifies triggers for horizontal scaling. Understanding these factors helps maintain stability and prevent resource exhaustion during peak loads.
- Memory per worker is approximately 100–200 MB as a baseline plus additional request buffers.
- CPU efficiency decreases with diminishing returns beyond twice the number of CPU cores.
- Connection limits are approximately 1000 concurrent connections per worker when using asyncio.
- CPU utilization remains above 70 percent after tuning the worker count.
- Memory pressure begins to affect response times.
- Network I/O saturation occurs, which is rare for typical MCP loads.
- There is a need for geographic distribution or high availability.