Hybrid Data Pipeline requires a system database for storing account and configuration information. In a Kubernetes deployment, a PostgreSQL system database is managed through a PVC (PersistentVolumeClaim). Therefore, the health of the PVC is critical to the health of a Hybrid Data Pipeline Kubernetes deployment. The following procedure provides a safe and predictable path to recovering a corrupted PostgreSQL primary volume by copying data from a healthy read replica.

Note: This procedure applies to scenarios where Hybrid Data Pipeline has been deployed with PostgreSQL in replication mode.
Important: This procedure is based on recovering data from a healthy read replica with up-to-date data. If your read replica is not healthy and up-to-date, do not follow this procedure. Instead, restore from backups or snapshots per your standard disaster recovery procedures.

Prerequisites:

  • kubectl access to the Kubernetes cluster
  • Healthy read replica with up-to-date data
  • Administrative access to scale StatefulSets

Recovery Procedure

Take the following steps to recover the PostgreSQL primary PVC.

  1. Stop all services to ensure data consistency during recovery.
    
    # Scale down HDP server
    kubectl scale statefulset <release_name>-hdpserver --replicas=0
    
    # Scale down PostgreSQL read replicas
    kubectl scale statefulset <release_name>-postgresql-read --replicas=0
    
    # Scale down PostgreSQL primary
    kubectl scale statefulset <release_name>-postgresql-primary --replicas=0
    
  2. Verify that all pods have been terminated.
    
    # Verify all pods are terminated
    kubectl get pods
    # Expected response: No running pods
    
  3. Restore the corrupted primary PVC by copying data from the read replica to the primary PVC.
    Note:
    • This operation will overwrite the data on the primary PVC. Before proceeding, you should be aware that you could lose data that exists only on the primary and not on the read replica.
    • You may either use a migration tool or manually copy the data directly from the read replica PVC to the primary PVC using Kubernetes Jobs.
  4. Restore services to an operational state by scaling up the StatefulSets.
    
    # Scale up PostgreSQL primary
    kubectl scale statefulset <release_name>-postgresql-primary --replicas=1
    
    # Scale up PostgreSQL read replicas
    kubectl scale statefulset <release_name>-postgresql-read --replicas=<number-of-replicas>
    
    # Scale up HDP server
    kubectl scale statefulset <release_name>-hdpserver --replicas=<number-of-replicas>
  5. Verify that the pods are up and running.
    
    # Check all pods are running
    kubectl get pods 
    
    # Verify PostgreSQL primary is operational
    kubectl exec -it <release_name>-postgresql-primary-0 -- psql -U postgres -c "SELECT version();"
    
    # Verify HDP connectivity
    kubectl exec <release_name>-hdpserver-0 -- curl -f <http://localhost:<port>/api/healthcheck>