To deploy Hybrid Data Pipeline on AWS, you must create an EC2 instance to host the Hybrid Data Pipeline service. As with any AWS environment, the EC2 instance must reside in an Amazon Virtual Private Cloud (VPC).

Create a VPC

If you do not already have a VPC with a public subnet, you must create one on which to deploy an EC2 instance.

Take the following steps to create a VPC.

  1. Go to the VPC Dashboard.
  2. Click Launch VPC Wizard.
  3. Select VPC with a Single Public Subnet, and click Select.
  4. Enter a name for the VPC.
  5. Click Create VPC.
  6. Record the name and ID of the VPC for reference.

Result

  • The VPC is created with IPv4 CIDR 10.0.0.0/16.
  • The VPC subnet Public Subnet is created with IPv4 CIDR 10.0.0.0/24.

Create an EC2 instance

Take the following steps to create an EC2 instance.

  1. Go to the EC2 Dashboard.
  2. Select Launch instances.
  3. Select Choose AMI, and select your preferred Amazon Machine Image.
  4. Select Choose Instance Type and select your preferred Instance Type.
    Note: The Hybrid Data Pipeline server must be installed on a 64-bit Linux machine with, at minimum, 4 cores and 8 GB of RAM.
  5. Select Configure Instance Details and enter details.

    Default values for all but the following items work in most single-node, non-load balancer deployment scenarios.

    • Number of instances: 1
    • Network (the name of the VPC created in which this instance will reside)
    • Subnet: Public Subnet
    • Auto-assign Public IP (enabled)
  6. Select Add Storage and enter the amount of storage you want to make available.
    Note: At least 100 GB of storage capacity is recommended.
  7. Optional. Select Add Tags, and create a key-value pair to name the instance. For example: Name | HDP-Instance.
  8. Select Configure Security Group, and select Create new security group.
  9. Enter a security group name and description, and add the following rules to the security group:
    Note: For security purposes, you may choose to provide an IP address or IP range for the Source property.
    • Rule 1
      • Type: SSH
      • Port range: 22
      • Source: Custom | 0.0.0.0/0
      • Description: Allow SSH traffic
    • Rule 2
      • Type: Custom TCP
      • Port range: 8080
      • Source: Custom | 0.0.0.0/0
      • Description: Allow inbound traffic on port 8080
    • Rule 3
      • Type: Custom TCP
      • Port range: 8443
      • Source: Custom | 0.0.0.0/0
      • Description: Allow inbound traffic on port 8443
    • Rule 4 (Required for connectivity from Hybrid Data Pipeline to data sources behind a firewall)
      • Type: Custom TCP
      • Port range: 40501
      • Source: Custom | 0.0.0.0/0
      • Description: Allow inbound traffic on port 40501
    • Rule 5 (Required for connectivity from Hybrid Data Pipeline to data sources behind a firewall)
      • Type: Custom TCP
      • Port range: 11280
      • Source: Custom | 0.0.0.0/0
      • Description: Allow inbound traffic on port 11280
    • Rule 6 (Required for connectivity from Hybrid Data Pipeline to data sources behind a firewall)
      • Type: Custom TCP
      • Port range: 11443
      • Source: Custom | 0.0.0.0/0
      • Description: Allow inbound traffic on port 11443
  10. Click Review, and review the attributes of the EC2 instance.
  11. Click Launch to launch the EC2 instance.
  12. You will be prompted to create a new PEM file.
    • Option 1. Create a separate PEM file, for example, hdp-instance.pem, and save it to a secure location. You will use this PEM file to access your EC2 instance with an SSH client.
    • Option 2. If you have a PEM file from a previous deployment, you can use it instead of creating a new PEM file. However, you must have access to it. You will need to specify the file to access your EC2 instance with an SSH client.
  13. Review the instance in the EC2 console. In case you need to refer to them later, record the name and ID of the security group associated with the instance, as well as the name and ID of the instance itself.

Result

You have created an EC2 instance to host the Hybrid Data Pipeline service.