When I decided to deploy all my services on Kubernetes, one of the critical components was the PostgreSQL database. Deploying databases on Kubernetes can be challenging due to their stateful nature. In this post, I’ll share how I deployed PostgreSQL on Kubernetes, the challenges I faced, and why I chose the CrunchyData PostgreSQL Operator.

Understanding Stateful vs. Stateless Applications

Before diving into deployment, it’s essential to understand the difference between stateful and stateless applications.

  • Stateless Applications: Do not store data or state between sessions. Each request is independent, and the application doesn’t need to remember previous interactions.
  • Stateful Applications: Maintain state across sessions. Databases like PostgreSQL are stateful because they need to store and retrieve data reliably.

Kubernetes was initially designed for stateless applications, but over time, support for stateful workloads has improved with features like StatefulSets and PersistentVolumes.

PostgreSQL Architecture Overview

PostgreSQL typically follows a primary-replica architecture:

  • Primary Node: Handles all write operations and reads.
  • Replica Nodes: Handle read operations and replicate data from the primary node.

Replication can be configured as:

  • Synchronous Replication: The primary waits for confirmation from replicas before completing a transaction. This ensures data consistency but can impact performance.
  • Asynchronous Replication: The primary doesn’t wait for replicas, which improves performance but may risk data loss if the primary fails before replication.

Handling Failover

In production environments, it’s crucial to handle scenarios where the primary node fails:

  • Automatic Failover: A replica is promoted to primary automatically.
  • Monitoring and Orchestration: Tools are needed to monitor the cluster and manage failover processes.

Deploying PostgreSQL on Kubernetes

To manage PostgreSQL clusters on Kubernetes effectively, several operators have been developed:

  • Zalando Postgres Operator
  • CrunchyData PostgreSQL Operator
  • CloudNativePG
  • And more

These operators automate tasks like deployment, scaling, backups, and failover.

Using the Zalando Postgres Operator

Why Zalando?

I initially chose the Zalando Postgres Operator because:

  • Simplicity: It’s straightforward to deploy.
  • Active Community: It’s maintained by Zalando with good community support.
  • Patroni Integration: Uses Patroni for high availability and automatic failover.

Deployment Steps

  1. Add Helm Repository and Install Operator

    helm repo add postgres-operator-charts https://opensource.zalando.com/postgres-operator/charts/postgres-operator
    
    kubectl create namespace postgres
    helm install postgres-operator postgres-operator-charts/postgres-operator --namespace postgres
    
  2. Apply PostgreSQL Cluster Manifest

    kubectl apply -f minimal-postgres-manifest.yaml
    

PostgreSQL Cluster Manifest

Here’s the minimal-postgres-manifest.yaml:

apiVersion: "acid.zalan.do/v1"
kind: postgresql
metadata:
  name: postgres
  namespace: postgres
spec:
  teamId: "acid"
  enableLogicalBackup: true
  numberOfInstances: 2
  users:
    zalando:
    - superuser
    - createdb
    <admin>:
    - createdb
  databases:
    astring_dev: <admin>
  preparedDatabases:
    astring_dev: {}
  postgresql:
    version: "16"
  volume:
    size: "1Gi"

How It Works

  • Operator Handles Logic: The Zalando operator manages the primary-replica setup, failover, and backups using Patroni.
  • Automatic Failover: If the primary node fails, the operator promotes a replica to primary.
  • Scaling: You can adjust numberOfInstances to scale replicas.

Challenges Faced

While Zalando’s operator is excellent, I encountered some issues:

  • Backup Configuration: Difficulty in configuring backups to S3-compatible storage.
  • Documentation Gaps: Limited guidance on restoring from backups and disaster recovery.
  • Customization Limitations: Needed more control over backup schedules and retention policies.

Switching to CrunchyData PostgreSQL Operator

Why CrunchyData?

After facing challenges with Zalando, I explored the CrunchyData PostgreSQL Operator:

  • Advanced Backup Options: Supports full and incremental backups to S3.
  • Comprehensive Documentation: Clear instructions for backup, restore, and disaster recovery.
  • Enhanced Metrics: Provides detailed monitoring for connections, queries, and transactions.
  • Greater Control: More flexibility in configuration and management.

Deployment Steps

  1. Install the Operator

    Follow the CrunchyData installation guide to deploy the operator in your Kubernetes cluster.

  2. Create kustomization yaml

    apiVersion: kustomize.config.k8s.io/v1beta1
    kind: Kustomization
    
    namespace: pgo
    
    secretGenerator:
    - name: pgo-s3-creds
      files:
      - s3.conf
    
    generatorOptions:
      disableNameSuffixHash: true
    
    resources:
    - postgres.yaml
    
  3. Create s3.conf

    [global]
    repo1-s3-key=<key>
    repo1-s3-key-secret=<secret>
    
  4. Create postgres.yaml

    apiVersion: postgres-operator.crunchydata.com/v1beta1
    kind: PostgresCluster
    metadata:
      name: postgres
    spec:
      image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres:ubi8-16.3-1
      postgresVersion: 16
      instances:
        - name: instance1
          replicas: 2
          dataVolumeClaimSpec:
            accessModes:
            - "ReadWriteOnce"
            resources:
              requests:
                storage: 1Gi
      backups:
        pgbackrest:
          image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:ubi8-2.51-1
          configuration:
          - secret:
              name: pgo-s3-creds
          global:
            repo1-retention-full: "14"
            repo1-retention-full-type: time
            repo1-path: /pgbackrest/postgres-operator/postgres/repo1
          repos:
          - name: repo1
            schedules:
              full: "0 1 * * 0"
              differential: "0 1 * * 1-6"
            s3:
              bucket: <backup_repo>
              endpoint: "s3.ap-southeast-1.wasabisys.com"
              region: "ap-southeast-1"
      users:
        - name: postgres
          options: 'SUPERUSER'
        - name: <admin_user>
          databases: [astring-prod, astring-dev]
        - name: <admin_user>
          databases: [warehouse]
    
      patroni:
        dynamicConfiguration:
          postgresql:
            pg_hba:
              - "hostnossl all all all md5"
    
      monitoring:
        pgmonitor:
          exporter:
            image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres-exporter:ubi8-5.6.1-0
    
  5. apply these

    kubectl apply -k ./
    

Key Features Configured

  • Instances: Set up with 2 replicas for high availability.
  • Backups:
    • pgBackRest: Configured for backups to S3-compatible storage.
    • Schedules:
      • Full Backups: Every Sunday at 1 AM.
      • Differential Backups: Monday to Saturday at 1 AM.
    • Retention: Keeps backups for 14 days.
  • Users and Databases:
    • Created users with specific roles and database access.
  • Patroni Configuration:
    • Manages automatic failover and replication.
  • Monitoring:
    • Enabled pgMonitor exporter for detailed metrics.

Benefits Experienced

  • Backup and Restore:
    • Seamless configuration of backups to S3.
    • Ability to restore backups in different clusters.
  • Detailed Metrics:
    • Access to comprehensive monitoring data.
  • Flexibility:
    • More control over cluster settings and behaviors.
  • Documentation:
    • Clear guidance on setup, maintenance, and troubleshooting.

Conclusion

Deploying PostgreSQL on Kubernetes requires careful planning, especially for stateful applications needing persistent storage and high availability. While the Zalando Postgres Operator is user-friendly, it didn’t meet all my requirements for backup and customization.

Switching to the CrunchyData PostgreSQL Operator provided the features and control I needed. With robust backup options, detailed metrics, and excellent documentation, it proved to be the better choice for my deployment.