Hybrid Data Pipeline can be deployed on one or more nodes behind a load balancer to provide high availability and scalability. Hybrid Data Pipeline supports load balancers that use the TCP tunneling protocol and ones that use the WebSocket protocol. During deployment, network load balancing should be specified for TCP tunneling, or cloud load balancing should be specified for the WebSocket protocol. 1 You must also provide the hostname or IP address of the machine hosting the load balancer when configuring the server during deployment.

In addition to configuring Hybrid Data Pipeline to use a load balancer, the load balancer itself must be configured for your Hybrid Data Pipeline environment. The following criteria describe configuration requirements for the load balancer.

  • The load balancer must be configured to accept HTTPS connections on port 443 and unencrypted HTTP connections on port 80.
  • The load balancer must be configured for SSL termination to support client-side SSL communication between the load balancer and client applications. The load balancer may also be configured to support server-side SSL communication. See SSL configuration for details.
  • The load balancer must support session affinity. The load balancer must either be configured to supply its own cookies or to pass the cookies generated by the Hybrid Data Pipeline service back to the client. The Hybrid Data Pipeline service provides a cookie named C2S-SESSION that can be used by the load balancer.

    For ODBC and JDBC applications, the ODBC and JDBC drivers automatically use cookies for session affinity. OData applications should be configured to echo cookies for optimal performance. When OData applications cannot be configured to echo cookies, an internal mechanism called the distributed file persistence manager is used. See Client application configuration for details.

  • The load balancer must pass the hostname in the Host header when a request is made to an individual Hybrid Data Pipeline node. For example, if the hostname used to access the cluster is hdp.mycorp.com and the individual nodes behind the load balancer have the hostnames hdpsvr1.mycorp.com, hdpsvr2.mycorp.com, hdpsvr3.mycorp.com, then the Host header in the request forwarded to the Hybrid Data Pipeline node must be the load balancer hostname hdp.mycorp.com.
  • The load balancer must supply the X-Forwarded-Proto header to indicate to the Hybrid Data Pipeline node whether the request was received by the load balancer as an HTTP or HTTPS request.
  • The load balancer must supply the X-Forwarded-For header for IP address filtering. The X-Forwarded-For header is also required if the client IP address is needed for Hybrid Data Pipeline access logs. If the X-Forwarded-For header is not supplied, the IP address in the access logs will always be the load balancer's IP address.
  • The load balancer may be configured to run HTTP or HTTPS health checks against nodes with the Health Check API.
  • For OData queries, the load balancer should be configured to timeout after 5 or more minutes. By default, Hybrid Data Pipeline gives OData requests just under 5 minutes (285 seconds) to respond. If the load balancer is configured for a shorter time, 504 Gateway timeout errors may occur.

  • Additional configurations are required if you are using the On-Premises Connector to connect to on-premises data. These configurations depend on the type of load balancer you are using. See the following topics for details.
Important: In addition to SSL termination, it is strongly recommended that the load balancer be configured to mitigate Denial of Service (DoS) and Distributed Denial of Service (DDoS) attacks through other mechanisms. Such mechanisms may include rate limiting and connection throttling, IP reputation and blacklisting, SYN flood protection, application layer filtering, geo-blocking and geo-fencing, SSL offloading and inspection, and Web Application Firewall (WAF) integration.
1 The TCP tunneling protocol is typically used in network load balancing, and the WebSocket protocol is typically used in cloud load balancing. However, depending on the requirements of your load balancer, either protocol may be used with network or cloud load balancing.