Before deploying Hybrid Data Pipeline on AKS, you must set up an AKS environment. Complete the following tasks to set up the environment.

Create an AKS resource group

Take the following steps to create an AKS resource group via the Azure portal.

  1. From the Azure portal home page, click Resource groups.
  2. Click Create.
  3. From the Basics tab, select a value for the Subscription parameter.
  4. On the Tags tab, enter tags if required by your organization's Azure policy.
  5. Click Next.
  6. From the Review + create tab, review the settings and click Create to create the cluster.

Result: After a few moments, Azure generates the resource group.

Create an AKS cluster

Take the following steps to create an AKS cluster via the Azure portal.

  1. From the Azure portal home page, click Kubernetes services.
  2. From the drop-down menu, click Kubernetes cluster.
  3. From the Basics tab, provide information for the following parameters and click Next:

    • Subscription: This should be the same subscription you used to create the resource group.
    • Resource group: This should be the resource group you created above.
    • Cluster preset configuration: Select your preferred configuration.
    • Kubernetes cluster name: Enter a name for the cluster.
  4. From the Node pools tab, configure the node pool.

    1. Open the default node pool agentpool, or click Add node pool.
    2. Specify values for the following parameters.

      • Node size: The value you specify should account for the number of Hybrid Data Pipeline pods, the PostgreSQL system database, and ingress resources. See Allocating resources for more information.
      • Scale method: Select your preferred configuration.
      • Node count: Select your preferred configuration.
    3. Save or add the node pool configuration.
    4. Select the node pool to be used with the cluster.
  5. Click Next.
  6. From the Networking tab, select Azure CNI Node Subnet for Network configuration.
  7. Click Next to review the following tabs and provide the necessary information:
    Note: You may need to check with your Azure administrator about required settings for your organization.
    • Integrations
    • Monitoring
    • Advanced
    • Tags
  8. From the Review + create tab, review the settings and click Create to create the cluster.

Result: After a few moments, Azure generates the cluster and a corresponding managed-cluster resource group. By default, the managed-cluster name takes the following format:

MC_resource-group-name_cluster-name

Add context for the Azure CLI

To use the Azure CLI, you must add the Kubernetes config file for your AKS cluster to your environment.

  1. Open the Azure Cloud Shell Bash or a command-line terminal.
  2. If you are using a command-line terminal, run the command az login and proceed with Azure authorization.
  3. Run the following aks get credentials command.
    az aks get-credentials --resource-group resource_group --name aks_cluster

    where:

    resource_group is the name of the manually created resource group.

    aks_cluster is the name of the AKS cluster.

Result: Kubernetes configuration context has been added to your environment.

Create a Kubernetes namespace

After adding Kubernetes context for the Azure client, you may create a Kubernetes namespace. A namespace is an effective way to organize and manage Kubernetes resources.

Use the following command to create a namespace.

kubectl create namespace namespace-value

where namespace-value is the value you specify for the namespace.

After you have created a namespace, you may create resources for the namespace. In the following example, the Secret defined in the hdp-secrets.yaml file is applied to the hdp-project namespace.

kubectl create -f hdp-secrets.yaml --namespace hdp-project

Important: Any Kubernetes Secrets used in the Helm deployment must be created in the same namespace used to deploy the Helm chart.

Create an application gateway

Take the following steps to create an application gateway.

  1. From the Azure portal home page, click Kubernetes services.
  2. Click Create.
  3. From the Basics tab, provide information for the following parameters and click Next:

    • Subscription: This should be the same subscription you used to create the resource group and the cluster.
    • Application gateway name: Enter a name for the application gateway.
    • Enable autoscaling: Select your preferred configuration.
    • Instance count: Select your preferred configuration.
    • Virtual network: Select the virtual network associated with the managed cluster.
    • Subnet: Select the subnet you want to use. In some cases, you may need to add a default subnet.
  4. From the Frontends tab, add a public IP address.

    1. For Frontend IP address type, select Public.
    2. For Public IPv4 address, select Add new.
    3. Enter a name for the public IP, and click OK.
    4. Click Next to move to the Backends tab.
  5. From the Backends tab, add a backend pool.

    Note: The backend pool you create is not the actual backend pool that will be used with Hybrid Data Pipeline. However, this allows for the deployment of backend pools that are used with Hybrid Data Pipeline.
    1. Click Add a backend pool.
    2. Enter a name for the backend pool.
    3. Click Add to add.
    4. Click Next to move to the Configuration tab.
  6. From the Configuration tab, configure a routing rule.

    1. Click Add a routing rule.
    2. Enter the rule name.
    3. Enter a priority such as 100.
    4. From the Listener tab, enter a listener name.
    5. Click on the Backend targets tab.
    6. From the Backend targets tab, provide values for the following parameters:
      • Backend target: Select the backend pool that you created in Step 5.
      • Backend settings: Select Add new, enter a name, and click Add.
    7. Click Add.
    8. Click Next to move to the Tags tab.
  7. On the Tags tab, enter tags if required by your organization's Azure policy.
  8. Click Next.
  9. From the Review + create tab, review the settings and click Create to create the cluster.

Result: After a few moments, Azure generates the application gateway, a public IP, and a backend pool.

Enable the Application Gateway ingress controller

Take the following steps to enable the Application Gateway ingress controller.

  1. From the Azure portal home page, select the AKS cluster you created.
  2. Under Settings, click Networking.
  3. Under the Virtual network integration tab, click Manage.
  4. In the Application Gateway ingress controller panel, tick the Ingress controller checkbox and select the application gateway you have created.

Result: The Application Gateway ingress controller has been enabled.

Obtain the load balancer hostname

When configuring your deployment, you must use the loadbalancer.hostName in the values.yaml file to specify a fully qualified domain name (FQDN). The FQDN is the public-facing address used to call the Hybrid Data Pipeline service.

When you create an application gateway, Azure creates a subdomain that may be used for ingress to the AKS cluster. The DNS name for this subdomain is the FQDN that may be used to call the Hybrid Data Pipeline service.

Take the following steps to obtain the application gateway DNS name or FQDN.

  1. From the Azure portal, open the managed-cluster resource group that was created with your AKS cluster.
  2. From the managed-cluster resource group page, select the public IP address created with the application gateway.
  3. Copy the value in the DNS name field. This is the FQDN. For example:

    hdp-ingress.eastus.cloudapp.azure.com

    Note: You may modify the DNS name by navigating to Settings > Configurations > DNS name label.

Next steps: After you obtain the DNS name or FQDN associated with the application gateway public IP, you may use it to specify the loadbalancer.hostName in the values.yaml. For example:

  loadbalancer:
    hostName: hdp-ingress.eastus.cloudapp.azure.com

Separate DNS server setup

Instead of using the FQDN generated by Azure, you may set up a DNS server for external access to Hybrid Data Pipeline. For example, in Azure, you may create an A - Address record record set in your DNS zone. In turn, the name of this DNS record set would be the value of the loadbalancer.hostName parameter in the values.yaml file. In this scenario, you must map the Kubernetes ingress IP address to the DNS server after executing the Helm install command. This procedure is described in Mapping the ingress IP address to the DNS server.

Set up private Docker image registry

Optionally, you may configure the Hybrid Data Pipeline Helm chart to pull a Hybrid Data Pipeline Docker image or a PostgreSQL Docker image from a private registry. In this scenario, you must first create a Kubernetes Secret that provides the credentials for accessing the private registry, and then configure the values.yaml file. If you are using two separate private registries for each image, you must create separate Secrets for each.

Take the following steps to create a Secret for a private registry:

  1. Run the following command to create the Secret:
    Important: The Secret must be created in the same namespace used to deploy the Helm chart.

    Hybrid Data Pipeline Secret for Docker image

    kubectl create secret hdp-docker-registry --docker-server=hdpdockerreg
      --docker-username=hdpdockeruser --docker-password=hdps3cr3t 
      --docker-email=hdpdockeruser@example.com --namespace namespace-value
    

    PostgreSQL Secret for Docker image

    kubectl create secret psql-docker-registry --docker-server=psqldockerreg 
      --docker-username=psqldockeruser --docker-password=psqls3cr3t 
      --docker-email=psqldockeruser@example.com --namespace namespace-value
    
  2. Provide image information and the name of the Secret in the values.yaml file. For example:

    Hybrid Data Pipeline image

      image:
        registry: hdpreg.example.com
        repository: hdpreg.example.com/hdp/hdp-docker-5.0.0
        tag: 3113
        digest: ""
        pullPolicy: IfNotPresent
      imagePullSecrets:
        - hdp-docker-registry
    

    PostgreSQL image

      image:
        registry: psqlreg.example.com
        repository: psqlreg.example.com/database/postgresql
        tag: 16.6.0
        digest: ""
        pullPolicy: IfNotPresent
        pullSecrets:
          - psql-docker-registry
    
Note:
  • By default, the Hybrid Data Pipeline Helm chart is configured to pull the PostgreSQL 16.6.0 Docker image from the public Bitnami PostgreSQL Docker repository. When using a private registry, image information must be updated accordingly.
  • The postgresql.global.security.allowInsecureImages parameter is set to false by default. However, if your environment does not require strict enforcement of image verification policies, you may set this parameter to true.

Create Kubernetes Secrets for required credentials

Two Kubernetes Secrets must be created to store credentials for the deployment and operation of Hybrid Data Pipeline. First, a Secret must be created to store the credentials of the PostgreSQL Privileged User. Second, a Secret must be created to store credentials for (1) Hybrid Data Pipeline server users and (2) Hybrid Data Pipeline system database users. All credentials should be secured and handled as sensitive information. For details on creating this Secret, see Creating Kubernetes Secrets for required credentials.

Create a Kubernetes TLS Secret

TLS is required for ODBC, JDBC, and on-premises connectivity. Therefore, a Kubernetes TLS Secret must be created to support these types of connections. The Kubernetes TLS Secret stores the TLS certificate and its associated private key. For information on creating the TLS Secret, see Creating a Kubernetes TLS Secret.

Next steps

Now that you have set up an AKS environment, you may proceed with Deploying Hybrid Data Pipeline on AKS.