Monitoring and Observability Part 2 - Logging

In the previous part, we discussed how to monitor and visualize metrics using Prometheus and Grafana. However, metrics alone aren’t sufficient for complete observability. To understand the full picture of your system’s health and behavior, you also need to collect and analyze logs. Logs provide detailed insights into what’s happening inside your applications and services, allowing you to diagnose issues and understand system behavior at a granular level.

In this post, we’ll explore how to set up centralized logging in Kubernetes using Fluent Bit and Loki, and how to integrate logs into your existing Grafana dashboards for unified monitoring.

Why Collect Logs in Kubernetes?

Kubernetes clusters run numerous services, including system components (like the API server and scheduler) and your own applications. While metrics can tell you that something is wrong (e.g., high CPU usage, increased error rates), logs help you understand why it’s happening.

Logs are crucial for:

Debugging and Troubleshooting: Identifying the root cause of issues.
Auditing: Keeping track of system events for compliance and security.
Performance Analysis: Understanding application behavior under different conditions.

Challenges with Logging in Kubernetes

Collecting logs in Kubernetes presents several challenges:

Volume: Logs can be voluminous, especially in large clusters or applications with verbose logging.
Ephemeral Nature of Pods: Containers and pods are transient, making it difficult to retain logs locally.
Distributed Environment: Logs are scattered across multiple nodes and pods.
Standardization: Logs may come in various formats, making aggregation and analysis challenging.

To address these challenges, it’s essential to centralize logs from all pods and services in a consistent and efficient manner.

Why Not Use Prometheus for Logs?

While Prometheus is excellent for collecting metrics, it’s not designed for log aggregation. Metrics are structured, numerical data points collected at regular intervals, resulting in relatively small data volumes. Logs, on the other hand, are unstructured or semi-structured text data generated continuously, leading to significantly larger data volumes.

Prometheus is optimized for high-cardinality time-series data, not for storing and querying large volumes of log data. Therefore, we need a dedicated logging solution.

Common Logging Solutions

ELK Stack

The ELK stack is a popular logging solution consisting of:

Elasticsearch: A search and analytics engine.
Logstash: A data processing pipeline that ingests logs, transforms them, and sends them to Elasticsearch.
Kibana: A visualization tool for Elasticsearch data.

While powerful, the ELK stack can be resource-intensive and complex to manage, especially for smaller clusters. It also introduces an additional dashboard (Kibana) separate from Grafana.

Fluent Bit and Loki

An alternative is to use Fluent Bit for log collection and Loki for storage and querying:

Fluent Bit: A lightweight log processor and forwarder, suitable for high-performance log ingestion.
Loki: A log aggregation system designed to store and query logs efficiently by indexing metadata instead of the full log content.
Grafana: Since Loki is developed by Grafana Labs, it integrates seamlessly with Grafana dashboards.

This stack is more lightweight than ELK and allows you to visualize logs within Grafana, keeping all monitoring in a single interface.

Setting Up Fluent Bit and Loki in Kubernetes

Overview of the Architecture

Fluent Bit runs as a DaemonSet on each Kubernetes node, collecting logs from all pods.
Fluent Bit forwards the logs to Loki.
Loki stores the logs efficiently, indexing only metadata (labels).
Grafana connects to Loki as a data source, allowing you to query and visualize logs alongside your metrics.

Deploying Loki with Helm

We’ll use the Loki Helm chart to deploy Loki in our cluster.

Step 1: Add the Grafana Helm Repository

helm repo add grafana https://grafana.github.io/helm-charts
helm repo update

Step 2: Create a `values.yaml` for Loki Configuration

Create a values.yaml file to customize the Loki deployment:

backend:
  replicas: 0
bloomCompactor:
  replicas: 0
bloomGateway:
  replicas: 0
chunksCache:
  writebackSizeLimit: 100MB
compactor:
  replicas: 0
deploymentMode: SingleBinary
distributor:
  replicas: 0
indexGateway:
  replicas: 0
ingester:
  replicas: 0
loki:
  commonConfig:
    replication_factor: 1
  ingester:
    chunk_encoding: snappy
  querier:
    max_concurrent: 2
  schemaConfig:
    configs:
    - from: "2024-06-01"
      index:
        period: 24h
        prefix: loki_index_
      object_store: s3
      schema: v13
      store: tsdb
  storage:
    bucketNames:
      admin: k8s-loki-chunks
      chunks: k8s-loki-chunks
      ruler: k8s-loki-chunks
    s3:
      accessKeyId: <secret>
      insecure: false
      region: ap-southeast-1
      s3: s3://<secret>:<key>@s3.ap-southeast-1.wasabisys.com/k8s-loki-chunks
      s3ForcePathStyle: false
      secretAccessKey: <key>
    type: s3
  tracing:
    enabled: true
minio:
  enabled: false
querier:
  replicas: 0
queryFrontend:
  replicas: 0
queryScheduler:
  replicas: 0
read:
  replicas: 0
singleBinary:
  extraEnv:
  - name: GOMEMLIMIT
    value: 2750MiB
  replicas: 1
  resources:
    limits:
      cpu: 3
      memory: 3Gi
    requests:
      cpu: 2
      memory: 1Gi
write:
  replicas: 0

Explanation:

deploymentMode: Using singleBinary mode for a simple setup suitable for small clusters.
schemaConfig: Configures the storage schema and specifies that logs will be stored starting from 2024-06-01.
storage: Configures S3-compatible storage for logs (e.g., Wasabi). Replace <your-access-key-id> and <your-secret-access-key> with your credentials.
singleBinary: Specifies resource requests and limits for the Loki pod.

Step 3: Install Loki with Helm

helm install loki grafana/loki --namespace logging --create-namespace -f values.yaml

Deploying Fluent Bit with Helm

We’ll use the Fluent Bit Helm chart to deploy Fluent Bit as a DaemonSet.

Step 1: Create a `values.yaml` for Fluent Bit Configuration

Create a values.yaml file for Fluent Bit:

args:
- -e
- /fluent-bit/bin/out_grafana_loki.so
- --workdir=/fluent-bit/etc
- --config=/fluent-bit/etc/conf/fluent-bit.conf

config:
  inputs: |
    [INPUT]
        Name tail
        Tag kube.*
        Path /var/log/containers/*.log
        # Be aware that local clusters like docker-desktop or kind use the docker log format and not the cri (https://docs.fluentbit.io/manual/installation/kubernetes#container-runtime-interface-cri-parser)
        multiline.parser docker, cri
        Mem_Buf_Limit 5MB
        Skip_Long_Lines On    
  outputs: |
    [Output]
        Name grafana-loki
        Match kube.*
        Url ${FLUENT_LOKI_URL}
        TenantID foo
        Labels {job="fluent-bit"}
        LabelKeys level,app # this sets the values for actual Loki streams and the other labels are converted to structured_metadata https://grafana.com/docs/loki/<LOKI_VERSION>/get-started/labels/structured-metadata/
        BatchWait 1
        BatchSize 1001024
        LineFormat json
        LogLevel info
        AutoKubernetesLabels true    
env:
- name: FLUENT_LOKI_URL
  value: http://loki-gateway.logging.svc.cluster.local/loki/api/v1/push
image:
  repository: grafana/fluent-bit-plugin-loki
  tag: main-e2ed1c0

Explanation:

args: Specifies the Loki output plugin for Fluent Bit.
config.service: Configures the Fluent Bit service settings.
config.inputs: Defines the input plugin to read logs from container log files.
config.outputs: Configures the output plugin to send logs to Loki.
env: Sets the environment variable for the Loki URL.
image: Specifies the Fluent Bit image with the Loki plugin.

Step 2: Install Fluent Bit with Helm

helm repo add fluent https://fluent.github.io/helm-charts
helm repo update

helm install fluent-bit fluent/fluent-bit --namespace logging -f values.yaml

Integrating Loki with Grafana

If you have Grafana installed (e.g., from the kube-prometheus-stack), you can add Loki as a data source.

Step 1: Add Loki as a Data Source

Log in to Grafana.
Go to Configuration (gear icon) > Data Sources.
Click Add data source.
Select Loki from the list.
Configure the following settings:
- URL: http://loki-gateway.logging.svc.cluster.local
Click Save & Test to verify the connection.

Visualizing Logs in Grafana

Now you can explore logs in Grafana:

Explore: Use the Explore tab to query logs using LogQL.
Dashboards: Create dashboards that include both metrics and logs.
Alerts: Set up alerts based on log patterns or error rates.

Benefits of Using Fluent Bit and Loki

Lightweight: Both Fluent Bit and Loki are designed to be resource-efficient.
Scalable: Suitable for clusters of all sizes.
Unified Monitoring: View metrics and logs in a single Grafana dashboard.
Simplified Management: Easier to set up and maintain compared to heavier solutions like the ELK stack.
Cost-Effective: Loki indexes only metadata, reducing storage costs.

Considerations

Storage: Logs can consume significant storage. Using S3-compatible object storage helps manage this efficiently.
Retention Policies: Configure retention periods according to your needs to prevent excessive storage use.
Security: Ensure that access to logs is secured, especially if they contain sensitive information.
Labeling: Proper labeling of logs enhances query capabilities in Loki.

Conclusion

By implementing Fluent Bit and Loki for logging in Kubernetes, you gain deeper insights into your system’s behavior while maintaining a lightweight and efficient logging infrastructure. Integrating logs into your Grafana dashboards provides a unified view of your cluster’s health, enabling you to monitor and troubleshoot effectively.

Now, with this setup, you can:

Collect logs from all pods and nodes in your Kubernetes cluster.
Store logs efficiently using Loki and object storage.
Visualize and query logs in Grafana alongside your metrics.

Why Collect Logs in Kubernetes?#

Challenges with Logging in Kubernetes#

Why Not Use Prometheus for Logs?#

Common Logging Solutions#

ELK Stack#

Fluent Bit and Loki#

Setting Up Fluent Bit and Loki in Kubernetes#

Overview of the Architecture#

Deploying Loki with Helm#

Step 1: Add the Grafana Helm Repository#

Step 2: Create a values.yaml for Loki Configuration#

Step 3: Install Loki with Helm#

Deploying Fluent Bit with Helm#

Step 1: Create a values.yaml for Fluent Bit Configuration#

Step 2: Install Fluent Bit with Helm#

Integrating Loki with Grafana#

Step 1: Add Loki as a Data Source#

Visualizing Logs in Grafana#

Benefits of Using Fluent Bit and Loki#

Considerations#

Conclusion#