A Hybrid Data Pipeline Helm chart deployment includes a Hybrid Data Pipeline cluster, a persistent volume for shared files, a persistent volume for logs, and a PostgreSQL system database. The following sections describe resource requirements for these services and how to allocate resources for them.

Note:
  • For memory and storage size, values must be specified in Mi (Mebibytes) or Gi (Gibibytes). For example, 4096Mi or 4Gi.
  • For CPU, values must be specified in millicores (m) or whole cores. For example, 2000m or 2.

Hybrid Data Pipeline cluster

A Hybrid Data Pipeline cluster consists of one or more pods running a containerized version of the Hybrid Data Pipeline service. By default, the number of pods is set to 2.

  replicaCount: 2

The following memory and CPU configurations are the default settings for each pod.

  resources:
    requests:
      memory: 4096Mi
      cpu: 2000m
    limits:
      memory: 4096Mi
      cpu: 2000m
Important: These default settings are minimum allocations to facilitate deployment of Hybrid Data Pipeline. The recommended allocation for running the Hybrid Data Pipeline service is 100 Gi. However, you may choose to decrease or increase this memory allocation depending on the server load.

To change these settings, you must update the values in the manifest file and run a Helm upgrade. For details, see Upgrading the Helm chart.

Persistent volume for shared files

The persistent volume for shared files is used to store files that are shared across the deployment, including key and truststore files (see also Shared files and the key location). By default, 1Gi is allocated for the persistent volume. As shown in the following example, a storageClassName must also be specified.

  persistence:
    keystore:
      mountPath: /hdpshare
      size: 1Gi
      storageClassName: azurefile-csi

Since the size and number of shared files is relatively static, there will be no reason to change the default size of the persistent volume claim in most cases. However, if you want to resize the persistent volume, you have two options:

  • Azure supports resizing persistent volumes with no downtime. Refer to Resize a persistent volume in the Azure Kubernetes Services (AKS) documentation.
  • Alternatively, you may update the size allocation in the manifest file and run a Helm upgrade. For details, see Upgrading the Helm chart.

Persistent volume for logs

The persistent volume for logs is used to store system logs for the Hybrid Data Pipeline server (see also System logs). By default, 1 Gi is allocated for the persistent volume claim. As shown in the following example, a storageClassName must also be specified.

  persistence:
    ...
    logs:
      enabled: true
      mountPath: /logs
      size: 1Gi
      storageClassName: azurefile-csi
Important: This default setting is a minimum allocation to facilitate deployment of Hybrid Data Pipeline. The recommended allocation for the logs volume is 10 Gi. However, depending on server load, logging levels, and log cleanup, you may choose to decrease or increase the memory allotted.

You have two options for updating this allocation.

  • Azure supports resizing persistent volumes with no downtime. Refer to Resize a persistent volume in the Azure Kubernetes Services (AKS) documentation.
  • Alternatively, you may update the size allocation in the manifest file and run a Helm upgrade. For details, see Upgrading the Helm chart.

PostgreSQL system database

The PostgreSQL database functions as a system database to store information for the operation of Hybrid Data Pipeline. By default, the Hybrid Data Pipeline Helm chart is configured to deploy a PostgreSQL system database using the Bitnami PostgreSQL Helm chart. For system database configuration, the Hybrid Data Pipeline Helm chart manifest includes parameters from the PostgreSQL Helm chart manifest. These parameters may be configured according to your deployment requirements.

CPU and memory

You may customize CPU and memory by updating the following resource parameters in the postgresql.primary section of the Hybrid Data Pipeline manifest file.

  resources:
    requests:
      cpu: 2
      memory: 4Gi
    limits:
      cpu: 4
      memory: 8Gi

Concurrent connections

You may also customize the maximum number of concurrent connections to the PostgreSQL database by updating the value of the POSTGRESQL_MAX_CONNECTIONS environment variable via the extraEnvVars parameter in the postgresql.primary section of the manifest file.

  extraEnvVars:
  name: POSTGRESQL_MAX_CONNECTIONS
  value: "400"

The default value "400" works with most scenarios. However, depending on the number of replicas in your cluster, you may decrease or increase this value.

PostgreSQL persistent volume

You may also change the size of the persistent volume claim for the PostgreSQL database with the persistence.size parameter in the postgresql.primary section of the manifest file.

  persistence:
    size: 8Gi
Note: You may customize other configurations by incorporating parameters from the PostgreSQL values.yaml manifest into the Hybrid Data Pipeline values.yaml manifest. The Bitnami PostgreSQL values.yaml manifest file is available in the Bitnami PostgeSQL GitHub repository:

https://github.com/bitnami/charts/blob/main/bitnami/postgresql/values.yaml

How to apply configurations

After you update the Hybrid Data Pipeline manifest file, you must run a Helm upgrade to apply new configurations. For details, see Upgrading the Helm chart.