In the previous part, we discussed how to monitor and visualize metrics using Prometheus and Grafana. However, metrics alone aren’t sufficient for complete observability. To understand the full picture of your system’s health and behavior, you also need to collect and analyze logs. Logs provide detailed insights into what’s happening inside your applications and services, allowing you to diagnose issues and understand system behavior at a granular level.
In this post, we’ll explore how to set up centralized logging in Kubernetes using Fluent Bit and Loki, and how to integrate logs into your existing Grafana dashboards for unified monitoring.
Why Collect Logs in Kubernetes?
Kubernetes clusters run numerous services, including system components (like the API server and scheduler) and your own applications. While metrics can tell you that something is wrong (e.g., high CPU usage, increased error rates), logs help you understand why it’s happening.
Logs are crucial for:
- Debugging and Troubleshooting: Identifying the root cause of issues.
- Auditing: Keeping track of system events for compliance and security.
- Performance Analysis: Understanding application behavior under different conditions.
Challenges with Logging in Kubernetes
Collecting logs in Kubernetes presents several challenges:
- Volume: Logs can be voluminous, especially in large clusters or applications with verbose logging.
- Ephemeral Nature of Pods: Containers and pods are transient, making it difficult to retain logs locally.
- Distributed Environment: Logs are scattered across multiple nodes and pods.
- Standardization: Logs may come in various formats, making aggregation and analysis challenging.
To address these challenges, it’s essential to centralize logs from all pods and services in a consistent and efficient manner.
Why Not Use Prometheus for Logs?
While Prometheus is excellent for collecting metrics, it’s not designed for log aggregation. Metrics are structured, numerical data points collected at regular intervals, resulting in relatively small data volumes. Logs, on the other hand, are unstructured or semi-structured text data generated continuously, leading to significantly larger data volumes.
Prometheus is optimized for high-cardinality time-series data, not for storing and querying large volumes of log data. Therefore, we need a dedicated logging solution.
Common Logging Solutions
ELK Stack
The ELK stack is a popular logging solution consisting of:
- Elasticsearch: A search and analytics engine.
- Logstash: A data processing pipeline that ingests logs, transforms them, and sends them to Elasticsearch.
- Kibana: A visualization tool for Elasticsearch data.
While powerful, the ELK stack can be resource-intensive and complex to manage, especially for smaller clusters. It also introduces an additional dashboard (Kibana) separate from Grafana.
Fluent Bit and Loki
An alternative is to use Fluent Bit for log collection and Loki for storage and querying:
- Fluent Bit: A lightweight log processor and forwarder, suitable for high-performance log ingestion.
- Loki: A log aggregation system designed to store and query logs efficiently by indexing metadata instead of the full log content.
- Grafana: Since Loki is developed by Grafana Labs, it integrates seamlessly with Grafana dashboards.
This stack is more lightweight than ELK and allows you to visualize logs within Grafana, keeping all monitoring in a single interface.
Setting Up Fluent Bit and Loki in Kubernetes
Overview of the Architecture
- Fluent Bit runs as a DaemonSet on each Kubernetes node, collecting logs from all pods.
- Fluent Bit forwards the logs to Loki.
- Loki stores the logs efficiently, indexing only metadata (labels).
- Grafana connects to Loki as a data source, allowing you to query and visualize logs alongside your metrics.
Deploying Loki with Helm
We’ll use the Loki Helm chart to deploy Loki in our cluster.
Step 1: Add the Grafana Helm Repository
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
Step 2: Create a values.yaml
for Loki Configuration
Create a values.yaml
file to customize the Loki deployment:
backend:
replicas: 0
bloomCompactor:
replicas: 0
bloomGateway:
replicas: 0
chunksCache:
writebackSizeLimit: 100MB
compactor:
replicas: 0
deploymentMode: SingleBinary
distributor:
replicas: 0
indexGateway:
replicas: 0
ingester:
replicas: 0
loki:
commonConfig:
replication_factor: 1
ingester:
chunk_encoding: snappy
querier:
max_concurrent: 2
schemaConfig:
configs:
- from: "2024-06-01"
index:
period: 24h
prefix: loki_index_
object_store: s3
schema: v13
store: tsdb
storage:
bucketNames:
admin: k8s-loki-chunks
chunks: k8s-loki-chunks
ruler: k8s-loki-chunks
s3:
accessKeyId: <secret>
insecure: false
region: ap-southeast-1
s3: s3://<secret>:<key>@s3.ap-southeast-1.wasabisys.com/k8s-loki-chunks
s3ForcePathStyle: false
secretAccessKey: <key>
type: s3
tracing:
enabled: true
minio:
enabled: false
querier:
replicas: 0
queryFrontend:
replicas: 0
queryScheduler:
replicas: 0
read:
replicas: 0
singleBinary:
extraEnv:
- name: GOMEMLIMIT
value: 2750MiB
replicas: 1
resources:
limits:
cpu: 3
memory: 3Gi
requests:
cpu: 2
memory: 1Gi
write:
replicas: 0
Explanation:
- deploymentMode: Using
singleBinary
mode for a simple setup suitable for small clusters. - schemaConfig: Configures the storage schema and specifies that logs will be stored starting from
2024-06-01
. - storage: Configures S3-compatible storage for logs (e.g., Wasabi). Replace
<your-access-key-id>
and<your-secret-access-key>
with your credentials. - singleBinary: Specifies resource requests and limits for the Loki pod.
Step 3: Install Loki with Helm
helm install loki grafana/loki --namespace logging --create-namespace -f values.yaml
Deploying Fluent Bit with Helm
We’ll use the Fluent Bit Helm chart to deploy Fluent Bit as a DaemonSet.
Step 1: Create a values.yaml
for Fluent Bit Configuration
Create a values.yaml
file for Fluent Bit:
args:
- -e
- /fluent-bit/bin/out_grafana_loki.so
- --workdir=/fluent-bit/etc
- --config=/fluent-bit/etc/conf/fluent-bit.conf
config:
inputs: |
[INPUT]
Name tail
Tag kube.*
Path /var/log/containers/*.log
# Be aware that local clusters like docker-desktop or kind use the docker log format and not the cri (https://docs.fluentbit.io/manual/installation/kubernetes#container-runtime-interface-cri-parser)
multiline.parser docker, cri
Mem_Buf_Limit 5MB
Skip_Long_Lines On
outputs: |
[Output]
Name grafana-loki
Match kube.*
Url ${FLUENT_LOKI_URL}
TenantID foo
Labels {job="fluent-bit"}
LabelKeys level,app # this sets the values for actual Loki streams and the other labels are converted to structured_metadata https://grafana.com/docs/loki/<LOKI_VERSION>/get-started/labels/structured-metadata/
BatchWait 1
BatchSize 1001024
LineFormat json
LogLevel info
AutoKubernetesLabels true
env:
- name: FLUENT_LOKI_URL
value: http://loki-gateway.logging.svc.cluster.local/loki/api/v1/push
image:
repository: grafana/fluent-bit-plugin-loki
tag: main-e2ed1c0
Explanation:
- args: Specifies the Loki output plugin for Fluent Bit.
- config.service: Configures the Fluent Bit service settings.
- config.inputs: Defines the input plugin to read logs from container log files.
- config.outputs: Configures the output plugin to send logs to Loki.
- env: Sets the environment variable for the Loki URL.
- image: Specifies the Fluent Bit image with the Loki plugin.
Step 2: Install Fluent Bit with Helm
helm repo add fluent https://fluent.github.io/helm-charts
helm repo update
helm install fluent-bit fluent/fluent-bit --namespace logging -f values.yaml
Integrating Loki with Grafana
If you have Grafana installed (e.g., from the kube-prometheus-stack), you can add Loki as a data source.
Step 1: Add Loki as a Data Source
- Log in to Grafana.
- Go to Configuration (gear icon) > Data Sources.
- Click Add data source.
- Select Loki from the list.
- Configure the following settings:
- Click Save & Test to verify the connection.
Visualizing Logs in Grafana
Now you can explore logs in Grafana:
- Explore: Use the Explore tab to query logs using LogQL.
- Dashboards: Create dashboards that include both metrics and logs.
- Alerts: Set up alerts based on log patterns or error rates.
Benefits of Using Fluent Bit and Loki
- Lightweight: Both Fluent Bit and Loki are designed to be resource-efficient.
- Scalable: Suitable for clusters of all sizes.
- Unified Monitoring: View metrics and logs in a single Grafana dashboard.
- Simplified Management: Easier to set up and maintain compared to heavier solutions like the ELK stack.
- Cost-Effective: Loki indexes only metadata, reducing storage costs.
Considerations
- Storage: Logs can consume significant storage. Using S3-compatible object storage helps manage this efficiently.
- Retention Policies: Configure retention periods according to your needs to prevent excessive storage use.
- Security: Ensure that access to logs is secured, especially if they contain sensitive information.
- Labeling: Proper labeling of logs enhances query capabilities in Loki.
Conclusion
By implementing Fluent Bit and Loki for logging in Kubernetes, you gain deeper insights into your system’s behavior while maintaining a lightweight and efficient logging infrastructure. Integrating logs into your Grafana dashboards provides a unified view of your cluster’s health, enabling you to monitor and troubleshoot effectively.
Now, with this setup, you can:
- Collect logs from all pods and nodes in your Kubernetes cluster.
- Store logs efficiently using Loki and object storage.
- Visualize and query logs in Grafana alongside your metrics.