The Hybrid Data Pipeline Helm chart supports affinity and topology spread constraints for advanced pod scheduling to give you fine-grained control over where your application pods are placed within your Kubernetes cluster. Affinity and topology spread constraints ensure optimal pod distribution for high availability and fault tolerance. Affinity controls which pods can be scheduled based on labels and relationships. Topology spread constraints ensures even distribution of pods across failure domains like zones, regions, or nodes.

The following affinity types are supported:

  • Node affinity for scheduling pods on specific nodes
  • Pod affinity for co-locating pods with related workloads
  • Pod anti-affinity for distributing pods across failure domains

Advanced pod scheduling is available for:

Hybrid Data Pipeline server pods

The following example shows how to configure node affinity to schedule Hybrid Data Pipeline pods on a specific agent pool, pod anti-affinity to spread pods across nodes, and topology spread constraints to distribute pods evenly across zones and nodes.


hdp:
  # Affinity configuration
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.azure.com/agentpool
            operator: In
            values:
            - hdppool
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: app.kubernetes.io/name
              operator: In
              values:
              - hdp
          topologyKey: kubernetes.io/hostname

  # Topology spread constraints
  topologySpreadConstraints:
  - maxSkew: 1
    topologyKey: kubernetes.io/hostname
    whenUnsatisfiable: DoNotSchedule
    labelSelector:
      matchLabels:
        app.kubernetes.io/name: hdp
  - maxSkew: 1
    topologyKey: topology.kubernetes.io/zone
    whenUnsatisfiable: ScheduleAnyway
    labelSelector:
      matchLabels:
        app.kubernetes.io/name: hdp

PostgreSQL system database instances

The following example shows how to configure PostgreSQL primary instances using either preset configurations or full affinity specifications for database node placement, and read replicas with anti-affinity and topology spread constraints.

postgresql:
  primary:
    # Use preset for simple configuration
    podAntiAffinityPreset: "soft"
    nodeAffinityPreset:
      type: "soft"
      key: "kubernetes.azure.com/agentpool"
      values:
        - "dbpool"
    
    # Or use full affinity specification
    affinity:
      nodeAffinity:
        preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 1
          preference:
            matchExpressions:
            - key: kubernetes.azure.com/agentpool
              operator: In
              values:
              - dbpool
    
    topologySpreadConstraints:
    - maxSkew: 1
      topologyKey: topology.kubernetes.io/zone
      whenUnsatisfiable: DoNotSchedule
      labelSelector:
        matchLabels:
          app.kubernetes.io/component: primary

  readReplicas:
    podAntiAffinityPreset: "soft"
    topologySpreadConstraints:
    - maxSkew: 1
      topologyKey: kubernetes.io/hostname
      whenUnsatisfiable: DoNotSchedule
      labelSelector:
        matchLabels:
          app.kubernetes.io/component: read

For descriptions of supported parameters, see Helm chart parameters or refer to the values.yaml file in the Hybrid Data Pipeline Helm chart GitHub repository.

For detailed information on these features, refer to the following Kubernetes documentation resources: