Hybrid Data Pipeline is a self-hosted solution that can be deployed on single node or on multiple nodes. Multiple node deployments require the use of a load balancer. For single-node deployments, the use of a load balancer is optional. Deploying Hybrid Data Pipeline on one or more nodes behind a load balancer is generally recommended. In either case, Hybrid Data Pipeline can be deployed on an in-house network or on a cloud service such as Amazon Web Services, Azure, or Google Cloud Services, giving you the ability to build your environment according to your organization's needs.

The following image shows the basic structure of an environment that uses Salesforce Connect and Hybrid Data Pipeline to consume external data.

expose external data with hybrid data pipeline image

How you deploy Hybrid Data Pipeline depends in part on the location of the external data you want to expose. In the simplest scenario, your external data already resides in the cloud, and you simply deploy Hybrid Data Pipeline in the cloud.

In contrast, if your external data resides on a database behind a firewall, you have two options.

  • First, you could install Hybrid Data Pipeline behind the firewall, close to your data. In this case, you would need to use a VPN or other gateway to enable communication between Salesforce and Hybrid Data Pipeline.
  • Second, you could host Hybrid Data Pipeline in the cloud and access data behind a network firewall, using the On-Premises Connector (OPC). The OPC is an agent that runs behind the firewall close to your data. It uses outbound SSL to communicate with the Hybrid Data Pipeline server, so there is no need to open ports with a VPN or other gateway.

Refer to the following documentation resources for more detailed deployment information.