Press ESC to close Press / to search

Grafana Loki Log Aggregation on Ubuntu 24.04: Production Deployment Guide 2026

When a single server becomes a fleet, grep over SSH stops being a log strategy....

System Monitoring Linux Open Source

When a single server becomes a fleet, grep over SSH stops being a log strategy. Centralized log aggregation is no longer optional, but the old ELK stack is heavy, expensive to operate, and usually overkill. Grafana Loki takes a different approach: it indexes only labels, stores the raw log data in cheap object storage, and integrates natively with Grafana dashboards you already run. In 2026, Loki 3.3 brings native structured metadata, TSDB-backed block storage, and first-class Bloom filters that make queries on billions of log lines practical. This guide walks through a production-grade Loki deployment on Ubuntu 24.04 with Promtail agents shipping logs from the rest of your fleet.

## Architecture Overview

A typical Loki deployment in 2026 has three moving parts: Loki itself (ingester, querier, distributor), an object storage backend (S3, GCS, or MinIO), and agents that ship logs. You can run Loki in monolithic, simple scalable, or microservices mode. For 10 to 100 hosts, simple scalable is the sweet spot — one `read` replica and two `write` replicas on modest hardware handle hundreds of gigabytes per day.

## Installing Loki

On Ubuntu 24.04, install from the official deb repository:

“`bash
curl -fsSL https://apt.grafana.com/gpg.key | sudo tee /etc/apt/keyrings/grafana.asc
echo “deb [signed-by=/etc/apt/keyrings/grafana.asc] https://apt.grafana.com stable main” | sudo tee /etc/apt/sources.list.d/grafana.list
sudo apt update
sudo apt install -y loki promtail grafana
“`

Create `/etc/loki/config.yml`:

“`yaml
auth_enabled: false

server:
http_listen_port: 3100

common:
path_prefix: /var/lib/loki
storage:
s3:
endpoint: s3.us-east-1.amazonaws.com
bucketnames: acme-loki
region: us-east-1
replication_factor: 1
ring:
kvstore:
store: inmemory

schema_config:
configs:
– from: 2024-01-01
store: tsdb
object_store: s3
schema: v13
index:
prefix: index_
period: 24h

limits_config:
retention_period: 720h
max_query_series: 5000
ingestion_rate_mb: 20
ingestion_burst_size_mb: 40

compactor:
working_directory: /var/lib/loki/compactor
retention_enabled: true
delete_request_store: s3
“`

Enable and start:

“`bash
sudo systemctl enable –now loki
sudo journalctl -u loki -f
“`

## Shipping Logs with Promtail

On each host, install Promtail and configure it to read the journal and key log files:

“`yaml
server:
http_listen_port: 9080

positions:
filename: /var/lib/promtail/positions.yaml

clients:
– url: http://loki.internal.acme.com:3100/loki/api/v1/push

scrape_configs:
– job_name: journal
journal:
max_age: 12h
labels:
job: systemd-journal
host: ${HOSTNAME}
relabel_configs:
– source_labels: [‘__journal__systemd_unit’]
target_label: unit

– job_name: nginx
static_configs:
– targets: [localhost]
labels:
job: nginx
host: ${HOSTNAME}
__path__: /var/log/nginx/*.log
“`

Restart: `sudo systemctl restart promtail`.

## Keeping Labels Under Control

The single most common Loki mistake is over-labeling. Every unique combination of labels creates a new stream, and millions of streams turn queries to molasses. Do not put path, user ID, request ID, or IP address in labels. Those belong in the log line itself, indexed by Loki 3’s structured metadata or extracted at query time with LogQL.

Good labels: `host`, `job`, `env`, `namespace`, `app`. That is about it.

## Querying with LogQL

LogQL looks like PromQL with log filters. Find all Nginx 5xx in the last hour:

“`logql
{job=”nginx”} |= “HTTP/1.1″ | regexp `” (?P5\d\d) ` | status != “”
“`

Count errors per host:

“`logql
sum by (host) (count_over_time({job=”nginx”} |~ ” 5\\d\\d ” [5m]))
“`

Find slow queries from Postgres logs:

“`logql
{job=”postgres”} | logfmt | duration > 1000
“`

## Dashboards and Alerts

Add Loki as a Grafana datasource (URL `http://localhost:3100`). Build dashboards that mix metrics and logs — panel showing CPU from Prometheus alongside a Loki panel of recent errors on the same host is a common pattern.

Alerting uses Ruler with the same PromQL-like rules:

“`yaml
groups:
– name: nginx
rules:
– alert: HighNginxErrorRate
expr: |
sum(rate({job=”nginx”} |~ ” 5\\d\\d ” [5m])) > 10
for: 5m
labels:
severity: warning
annotations:
summary: “High 5xx rate on {{ $labels.host }}”
“`

Point Loki ruler at your Alertmanager via `ruler.alertmanager_url`.

## Scaling Up

When a single Loki node starts queueing ingestion, split into write and read paths:

“`yaml
target: write
target: read
“`

Run two write replicas behind a small nginx load balancer and one or two read replicas with a shared S3 bucket. Memcached in front of the query path caches chunks and slashes query latency. Past 500 GB per day, move to Loki microservices mode with separate distributor, ingester, querier, and query-frontend services.

## Retention and Cost Control

Loki stores chunks in S3 indefinitely unless you set a retention period. Lifecycle rules on the bucket complement the `retention_period`:

“`json
{
“Rules”: [
{
“ID”: “archive-old-logs”,
“Status”: “Enabled”,
“Filter”: {“Prefix”: “loki/”},
“Transitions”: [{“Days”: 60, “StorageClass”: “GLACIER”}]
}
]
}
“`

## Securing Loki

By default, Loki has no authentication. Put it behind an Nginx reverse proxy with basic auth or mTLS:

“`
location /loki/ {
proxy_pass http://127.0.0.1:3100/;
auth_basic “loki”;
auth_basic_user_file /etc/nginx/loki.htpasswd;
}
“`

For multi-tenancy, enable `auth_enabled: true` and pass `X-Scope-OrgID` per tenant — essential when multiple teams share a cluster.

## FAQ

**How does Loki compare to Elasticsearch?** Loki is 5–10x cheaper per GB stored because it skips full-text indexing. It is faster for label-based filtering, slower for free-text search across unbounded time ranges.

**Is there a SaaS version?** Yes, Grafana Cloud Logs is hosted Loki with generous free tier and straightforward paid plans.

**Can I send Kubernetes logs to Loki?** Yes. The Grafana Agent or the Loki Helm chart with `loki-stack` deploys Promtail as a DaemonSet that reads every container log.

**Does Loki support tracing correlation?** Yes, via `derivedFields` in Grafana. Click a trace ID in a log line to jump to Tempo.

**What is the minimum RAM I need?** 2 GB for a lab, 8–16 GB per write node at production ingestion rates of 50+ MB/s.

**Can I use Loki without Grafana?** Yes, the HTTP API is fully usable from `curl` or LogCLI, but Grafana is the only mature UI in 2026 and almost everyone runs both together.

**How does Loki handle multiline logs like Java stack traces?** Promtail has a `multiline` stage that joins continuation lines back into a single log entry before shipping. Configure it with the regex matching your stack-trace start pattern.

**Is there a Promtail replacement?** Grafana Alloy is the new collector that combines Promtail, Grafana Agent, and OpenTelemetry collectors into a single binary. It is the recommended choice for new deployments in 2026.

## Migrating from ELK

Teams moving from Elasticsearch to Loki are usually motivated by cost. The migration path: install Loki alongside the existing ELK stack, point new sources at Loki first, build dashboards in Grafana that match the most-used Kibana ones, then decommission ELK after a parallel-run period. Logstash configurations translate cleanly to Promtail pipelines for the most common patterns (timestamp parsing, regex extraction, drop filters). The biggest behavioral difference is querying: LogQL filters on labels first and then full text within the matched streams, while Lucene queries all fields by default. Adjust user expectations and provide example queries during the transition.

## Bloom Filters and Fast Search

Loki 3.x added Bloom filter support that dramatically speeds up substring queries on label-bounded streams. Enable it in the chunks config:

“`yaml
bloom_compactor:
enabled: true
bloom_gateway:
enabled: true
“`

Once compaction has built bloom filters for older blocks, queries with `|=` filters skip blocks that demonstrably do not contain the substring, often turning a five-minute query into a five-second one. The trade-off is extra storage in S3 for the bloom data, typically 10–15% of chunk size.

## Multi-Tenancy

For shared infrastructure, enable multi-tenancy with `auth_enabled: true`. Each request must carry an `X-Scope-OrgID` header identifying the tenant. Tenants get isolated streams, separate retention, and per-tenant rate limits. A small nginx in front of Loki maps incoming JWTs or API keys to tenant IDs, so users do not have to set the header by hand.

“`yaml
limits_config:
max_streams_per_user: 50000
per_stream_rate_limit: 5MB
per_stream_rate_limit_burst: 20MB
“`

Set tenant-specific overrides in `runtime_config.yaml` and Loki re-reads it without restart.

## Backup and Recovery

Loki’s chunks are in S3, which is durable. The component you actually have to back up is the index — the TSDB files for index_2 and beyond. With the `boltdb-shipper` or `tsdb` shipper, the index lives in S3 alongside chunks, so a clean restore is just pointing a new Loki at the same bucket. Verify this quarterly by spinning up a fresh Loki instance against a copy of the bucket and querying historical data.

## Performance Troubleshooting

The most common Loki performance issue is “queries are slow.” Diagnose with the query stats endpoint:

“`bash
curl -s ‘http://loki:3100/loki/api/v1/query_range?query={job=”nginx”}&start=…&end=…’ | jq .data.stats
“`

The output shows bytes processed, chunks downloaded, and time spent in each stage. If chunks downloaded is high, your label cardinality is too low and you are scanning too much data — add a more selective label. If bytes processed is high relative to result size, add filters earlier in the query. If query time is dominated by `store_chunks_decompress_time`, the cache is undersized.

## Securing Loki at the Edge

Beyond basic auth, terminate TLS at nginx and enforce mTLS for Promtail clients with a private CA — Step-CA is a natural fit. This prevents random hosts from injecting log data and gives you cryptographic identity for every log source. Combine with rate limits per client and Loki becomes resistant to log injection attacks that could otherwise be used to overwhelm your retention budget.

## Cost Modeling

A practical rule of thumb in 2026: Loki ingest of 100 GB/day costs roughly $20–$40 per month in S3 storage and minor compute, compared to $300–$800 per month for an equivalent Elasticsearch deployment on hosted infrastructure. The savings scale linearly. Plan ingestion volumes with `loki_distributor_bytes_received_total` metrics and forecast a quarter ahead to right-size storage tiers and lifecycle rules. Most teams discover that 80% of their log volume comes from 5% of sources — usually a chatty debug logger left enabled in production. Tracking and curbing those is the highest-leverage cost optimization.

Was this article helpful?