In a Kubernetes cluster, applications and services sometimes require persistent storage, especially when dealing with databases and stateful workloads. Managing this storage efficiently and reliably is crucial for data integrity and application performance. In this post, I’ll delve into the concepts of persistent storage in Kubernetes, explain Persistent Volumes (PV) and Persistent Volume Claims (PVC), discuss stateful versus stateless applications, and share how I implemented Longhorn to address storage challenges in my cluster.

Understanding Persistent Storage in Kubernetes

What Is Persistent Storage?

Persistent storage refers to any data storage device that retains data after power is turned off. In the context of Kubernetes, it’s storage that outlives the lifecycle of individual pods and can be reattached to new pods as needed. This is essential for applications that require data persistence across restarts or scaling events.

Stateless vs. Stateful Applications

  • Stateless Applications: Do not retain any data between sessions or transactions. They rely entirely on external services or databases for data persistence. Examples include web frontends that fetch data from APIs.
  • Stateful Applications: Maintain state across sessions. They require persistent storage to save data. Databases like PostgreSQL, message queues, and file storage services are typical stateful applications.

Introducing Persistent Volumes (PV) and Persistent Volume Claims (PVC)

Kubernetes abstracts storage using two main resources:

Persistent Volume (PV)

A Persistent Volume is a piece of storage in the cluster provisioned by an administrator or dynamically by the storage class. It represents a real piece of underlying storage, such as a local disk, NFS share, or cloud storage.

Persistent Volume Claim (PVC)

A Persistent Volume Claim is a request for storage by a user. It specifies size, access modes, and other requirements. PVCs are bound to PVs, and pods use PVCs to request storage resources.

How PV and PVC Work Together

  1. Provisioning: A PV is created, representing actual storage.
  2. Claiming: A PVC is created by a user, requesting storage.
  3. Binding: Kubernetes matches a PVC to a suitable PV.
  4. Using: Pods use the PVC to access the storage.

This separation allows for decoupling storage provisioning from consumption, providing flexibility and scalability.

Challenges with Using Host Machine’s Disk

Using the host machine’s disk for storage in Kubernetes clusters presents several challenges:

  • Data Loss Risk: If the disk fails, all data stored on it is lost.
  • Manual Scaling: Increasing storage capacity requires manual intervention, such as adding new disks.
  • Lack of Redundancy: No built-in replication means no fault tolerance.
  • Node Dependency: If a pod is rescheduled to a different node, accessing data on the original node’s disk becomes problematic.

Need for a Persistent Storage Solution

To overcome these challenges, a distributed storage system is necessary—one that provides:

  • Replication: Copies data across multiple nodes for redundancy.
  • Dynamic Provisioning: Automatically allocates storage as needed.
  • Scalability: Easily increases storage capacity.
  • Data Locality: Places data close to where it’s consumed to reduce latency.
  • Centralized Management: Simplifies administration and monitoring.

Introducing Longhorn

What Is Longhorn?

Longhorn is an open-source, lightweight, and reliable distributed block storage system for Kubernetes. It creates a highly available persistent storage solution by replicating block storage across multiple nodes.

Why I Chose Longhorn

  • Simplicity: Easy to deploy and manage within Kubernetes.
  • Lightweight: Minimal resource overhead.
  • Features:
    • Replication: Ensures data redundancy.
    • Incremental Snapshots and Backups: Facilitates data protection and disaster recovery.
    • Data Locality: Optimizes performance by keeping data close to the consuming pod.
    • Centralized UI: Simplifies management and monitoring.
  • Flexibility: Supports various storage backends and cloud providers.

Installing Longhorn with Helm

I used Helm to install Longhorn in my cluster:

helm repo add longhorn https://charts.longhorn.io
helm repo update

helm install longhorn longhorn/longhorn --namespace longhorn-system --create-namespace --version 1.7.1
  • Adds the Longhorn Helm repository.
  • Updates the Helm repositories to get the latest charts.
  • Installs Longhorn into the longhorn-system namespace.

Accessing Longhorn UI

After installation, Longhorn provides a web-based UI for managing volumes, nodes, and settings. You can expose the UI using an Ingress resource or port forwarding.

Understanding Storage Classes

What Is a Storage Class?

A Storage Class in Kubernetes provides a way to describe the “classes” of storage available. It defines the provisioner (e.g., Longhorn), parameters, and reclaim policy.

Longhorn Storage Class

Upon installation, Longhorn creates a default storage class named longhorn. To use Longhorn for dynamic provisioning, specify this storage class in your PVCs:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: my-pvc
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: longhorn
  resources:
    requests:
      storage: 10Gi

Longhorn Features and Configuration

Replication for High Availability

By default, Longhorn replicates data across three replicas on different nodes. This ensures that if one node fails, the data is still accessible from other replicas.

  • Advantage: Provides fault tolerance and high availability.
  • Disadvantage: Can introduce latency due to network communication between nodes.

Backups and Snapshots

Longhorn supports:

  • Incremental Snapshots: Capture the state of a volume at a point in time.
  • Backups to S3: Configure backups to an S3-compatible storage for disaster recovery.

Centralized Management

The Longhorn UI allows you to:

  • Monitor Volumes: View status, health, and performance metrics.
  • Manage Replicas: Adjust replication settings per volume.
  • Configure Settings: Set global defaults and advanced options.

Optimizing Longhorn for Databases

The Issue with Replication

Databases like PostgreSQL and ScyllaDB often have their own replication mechanisms at the application level. Using storage-level replication in addition can lead to:

  • Increased Latency: replication at the storage layer adds network overhead.
  • Redundant Replication: Duplicates effort since the database already handles data redundancy.

Solution: Adjusting Replication Factor

For databases that handle their own replication:

  • Set Replicas to 1: Configure Longhorn volumes used by these databases to have a single replica.
  • Benefits:
    • Reduced Latency: Eliminates the overhead of storage-level replication.
    • Simplified Recovery: Rely on the database’s replication and backup strategies.

How to Adjust Replication

You can specify the number of replicas when creating a PVC by defining a custom StorageClass:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: longhorn-1-replica
provisioner: driver.longhorn.io
parameters:
  numberOfReplicas: "1"

Then, reference this storage class in your PVC:

spec:
  storageClassName: longhorn-1-replica

Enhancing Performance with Data Locality

Understanding Data Locality

Data Locality ensures that the data used by a pod is stored on the same node where the pod is running. This reduces latency by eliminating cross-node network communication for disk I/O operations.

The Challenge

By default, Longhorn volumes can be attached to any node, and the data may reside on a different node than the consuming pod, leading to:

  • Increased Latency: Due to network hops between nodes.
  • Potential Bottlenecks: Network issues can affect disk I/O performance.

Longhorn’s Data Locality Feature

Longhorn provides a Data Locality setting with options:

  • Disabled: No attempt to ensure data is local (default).
  • Best Effort: Tries to keep data local when possible.
  • Strict Local: Ensures data is always local to the pod’s node.

Enabling Data Locality

You can enable data locality per volume or globally. You can use ui for this.

  • Per Volume:

    apiVersion: longhorn.io/v1beta1
    kind: Volume
    metadata:
      name: my-volume
    spec:
      dataLocality: best-effort
    
  • Globally: Adjust the setting in the Longhorn UI under Settings.

Choosing the Right Setting

  • Best Effort: Suitable for most cases. Longhorn will attempt to keep data local but won’t fail if it can’t.
  • Strict Local: Use when you require data to be on the same node, understanding that it may limit scheduling flexibility.

Conclusion

Managing persistent storage in Kubernetes is critical for stateful applications. Longhorn offers a robust solution by providing:

  • Easy Deployment: Simple installation and integration with Kubernetes.
  • High Availability: Through replication (configurable as needed).
  • Flexible Configuration: Adjust replication factors and data locality based on workload requirements.
  • Centralized Management: User-friendly UI for monitoring and administration.
  • Disaster Recovery: Snapshots and backups to external storage.

By understanding the storage needs of your applications and configuring Longhorn appropriately, you can achieve a balance between performance, reliability, and efficiency in your Kubernetes cluster.