To install Hybrid Data Pipeline on an EC2 instance, you must access into the instance using an SSH client, copy the installation file to the instance, and then run the installer. After installation, iptables must be configured to accept traffic on ports used by Hybrid Data Pipeline.

Access the EC2 instance with an SSH client

The AWS EC2 Dashboard offers you a number of ways connect to your EC2 instance. The following instructions focus on using an SSH client to access your EC2 instance.

Take the following steps connect to your EC2 instance.

  1. Go to the EC2 Dashboard.
  2. Select the EC2 instance on which you are installing Hybrid Data Pipeline, and click Connect.
  3. If required, modify permissions on the PEM file associated with your EC2 instance.
    Note: To connect with your EC2 instance, the PEM file cannot be publicly viewable. Changing permissions may, in part, depend on whether the PEM is secured on a Windows or Linux machine.
    • Linux

      For Linux, run the chmod 400 command on the PEM file to ensure it is not publicly viewable. For example:

      chmod 400 hdp-on-aws.pem
    • Windows

      For Windows, depending on security configurations, you may need to disable inheritance on the PEM file. To disable inheritance, right click the file and go to Properties > Security > Advanced > Permissions > Disable inheritance. This will remove all permission entries for the file. Then, add a permission entry where you are the principal with full control over the file.

  4. Access your instance with an SSH client by specifying the PEM file and the AWS public IPv4 DNS associated with the instance. For example:
    ssh -i "c:\hdp-aws\hdp-on-aws.pem" ec2-user@ec2-12-34-56-78.compute-1
    .amazonaws.com
    Note: For additional information, refer to Connect to your Linux instance using SSH in the Amazon EC2 User Guide for Linux Instances.
  5. Update your EC2 instance. For example:
    sudo yum update

Result

You have successfully connected to your EC2 instance with an SSH client. You may now proceed with copying the installation file to the instance and installing Hybrid Data Pipeline.

Copy the installation file to the EC2 instance

Take the following steps to obtain and copy the installation file to your EC2 instance.

  1. Go to the DataDirect Connectors Download page, complete the download form, and click DOWNLOAD.
  2. Download the server installation file. The following file will be downloaded.

    PROGRESS_DATADIRECT_HDP_SERVER_LINUX_64_INSTALL.bin

  3. Copy the installation file to your EC2 instance using an SCP client. For example:
    scp -i "c:\hdp-aws\hdp-on-aws.pem" "c:\hdp-aws\PROGRESS_DATADIRECT_HD
    P_SERVER_LINUX_64_INSTALL.bin" ec2-user@ec2-12-34-56-78.compute-1.amazonaws.com:
    /home/ec2-user/hdp
    Note: For additional information, refer to Transfer files to Linux instances using an SCP client in the Amazon EC2 User Guide for Linux Instances.

Install Hybrid Data Pipeline

Important: Before you begin, it is recommended that you use a public DNS defined as a CNAME because the public DNS provided by AWS can change if your EC2 instance is rebooted. If the public DNS were to change, components such as the JDBC driver, the ODBC driver, and the On-Premises Connector would be unable to communicate with the Hybrid Data Pipeline server.

Take the following steps to install Hybrid Data Pipeline.

  1. Make the installation file an executable with the following chmod command:
    chmod +x ./PROGRESS_DATADIRECT_HDP_SERVER_LINUX_64_INSTALL.bin
  2. Run the executable:
    ./PROGRESS_DATADIRECT_HDP_SERVER_LINUX_64_INSTALL.bin -i console
  3. If prompted, press ENTER to accept disabling port conflict validation.
  4. Review and accept the license agreement.
  5. For the install folder, accept the default:

    /home/ec2-user/Progress/DataDirect/Hybrid_Data_Pipeline/Hybrid_Server

  6. For installation license type, accept the default value: Evaluation.
  7. For hostname, enter the public DNS for your instance.
    Note: As noted above, it is recommended that you provide a public DNS defined as a CNAME.
    • Option 1. The public DNS defined as a CNAME. For example:

      example.hybridpipe.com

    • Option 2. The AWS public IPv4 DNS. For example:

      ec2-12-34-56-78.compute-1.amazonaws.com

  8. For Hostname validation, select 2 – Skip Validation.
  9. Select the installation type 2 – Custom.
    Note: Alternatively, you may select 1 - Typical. However, the typical installation type requires the use of a self-signed certificate generated by the installer. Also, it does not allow the use of the On-Premises Connector for connecting to on-premises data sources.
  10. For key location, accept the default, 2 – User default location.
  11. Set the password for the d2cadmin account.
  12. Set the password for the d2cuser account.
  13. For Java configuration, accept the default, 2 – No.
  14. For FIPS configuration, accept the default, 2 – No.
  15. For the certificate file, choose either the self-signed certificate option, or provide a path to the certificate you will be using.
    Note: For evaluation purposes, it is fine to use the self-signed certificate in most cases. However, it may be required that you supply a certificate from a well-known certificate authority. For example, if you are using Hybrid Data Pipeline to access external data with Salesforce Connect, a CA certificate file is required. In this scenario, the certificate must be supplied in the form of a PEM file. For more information about the PEM file format, refer to The PEM file in the Hybrid Data Pipeline User's Guide.
  16. For MySQL Community Edition, accept the default, 2 – No.
  17. For Database Type, accept the default, 1 – Internal Database.
  18. For ports, accept the default values.
    • Database Port: 9001
    • HDP Server HTTP Port: 8080
    • HDP Server HTTPS Port: 8443
  19. For On-Premises Settings, select one of the following options.
    • 1 – Yes. Select this option if you plan to use the On-Premises Connector to connect to data sources behind a firewall.
    • 2 – No. Select this option if you are not planning to connect to data sources behind a firewall. Then, skip to 21.
  20. For On-Premises Ports, accept the default values.
    • On-Premises Port: 40501
    • Notification TCP Port: 11280
    • Notification SSL Port: 11443
  21. For Server Internal Ports, accept the default values.
    • Message Queue Port: 8282
    • Internal API Port: 8190
    • Internal API SSL Port: 8090
    • Shutdown Port: 8005
  22. Review the installation summary. If you are satisfied with your choices, press ENTER to install.
  23. After installation is complete, verify that Hybrid Data Pipeline processes are running by viewing process information with the following ps command.
    ps -ef|grep java

Result

You have installed Hybrid Data Pipeline on your EC2 instance.

Install and configure iptables

Before you can open and start using Hybrid Data Pipeline, you must configure the firewall of your EC2 instance to accept connections on ports the server uses. A utility such as iptables may be used to configure the firewall of your EC2 instance.

The following commands were used to install and configure iptables on an Amazon Linux 2 instance.

Install, enable, and start

sudo yum install iptables-services -y
sudo systemctl enable iptables
sudo systemctl start iptables

Allow traffic on ports 8080 and 8443

sudo iptables -I INPUT -p tcp -m tcp --dport 8080 -j ACCEPT
sudo iptables -I INPUT -p tcp -m tcp --dport 8443 -j ACCEPT
sudo service iptables save

Allow traffic on ports 40501, 11280, and 11443 (Required for connectivity from Hybrid Data Pipeline to data sources behind a firewall)

sudo iptables -I INPUT -p tcp -m tcp --dport 40501 -j ACCEPT
sudo iptables -I INPUT -p tcp -m tcp --dport 11280 -j ACCEPT
sudo iptables -I INPUT -p tcp -m tcp --dport 11443 -j ACCEPT
sudo service iptables save

Result

You have configured iptables on your EC2 instance to accept Hybrid Data Pipeline traffic. You can now open and begin using Hybrid Data Pipeline.