Press ESC to close Press / to search

OpenTelemetry on Linux: Complete Observability Setup for Metrics, Logs, and Traces

OpenTelemetry replaces node_exporter, Filebeat, and Jaeger agents with a single vendor-neutral Collector. This guide covers...

DevOpsLinuxSystem Monitoring Linux Open Source

OpenTelemetry has become the standard for observability instrumentation in modern infrastructure. If you are still running separate agents for metrics, logs, and traces — one for Prometheus, another for your logging backend, a third for distributed tracing — OpenTelemetry replaces that entire stack with a single, vendor-neutral framework. This guide covers deploying and using OpenTelemetry on Linux servers for practical infrastructure observability.

Table of Contents

What Is OpenTelemetry

OpenTelemetry (OTel) is a CNCF project that provides a standardized framework for generating, collecting, and exporting telemetry data — specifically metrics, logs, and traces. It merges the older OpenCensus and OpenTracing projects and is now the industry standard, supported by every major observability vendor including Grafana, Datadog, Dynatrace, New Relic, AWS CloudWatch, Azure Monitor, and Honeycomb.

The key benefit is vendor neutrality. You instrument your application once using the OpenTelemetry SDK, and you can send that data to any backend. Switching from Grafana Cloud to Datadog becomes a configuration change rather than a re-instrumentation project. For infrastructure-level data, the OpenTelemetry Collector can replace node_exporter, Filebeat, and Jaeger agents with a single daemon.

Core Components and Architecture

The Three Pillars

  • Metrics — Numeric measurements over time (CPU usage, request rate, memory bytes). OpenTelemetry metrics are compatible with Prometheus format and can replace node_exporter for host metrics.
  • Logs — Timestamped text records from applications and the operating system. OTel collects, processes, and routes logs from files, journald, and application stdout.
  • Traces — Distributed request tracking across microservices. Traces show the path of a request through your system with timing for each hop.

The OpenTelemetry Collector

The Collector is the central piece for infrastructure deployments. It is a standalone binary that runs as a system service, receives telemetry from multiple sources (called receivers), processes and enriches it (processors), and exports to multiple backends (exporters).

# Conceptual pipeline:
[Receivers] → [Processors] → [Exporters]

# Example flow:
hostmetricsreceiver → batchprocessor → prometheusremotewriteexporter
journaldreceiver   → filterprocessor → lokiexporter
otlpreceiver       → resourceprocessor → otlpexporter (to Jaeger)

Installing the OpenTelemetry Collector

rhel-rocky-linux-fedora">Install on RHEL / Rocky Linux / Fedora

# Add the OpenTelemetry repository
cat > /etc/yum.repos.d/opentelemetry.repo << 'EOF'
[opentelemetry]
name=OpenTelemetry Repository
baseurl=https://packages.opentelemetry.io/rpm/packages/
enabled=1
gpgcheck=1
gpgkey=https://packages.opentelemetry.io/rpm/packages/gpg-key.pub
EOF

# Install the Collector Contrib distribution (includes all receivers/exporters)
dnf install otelcol-contrib

# Or install the core distribution (smaller, fewer components)
dnf install otelcol

ubuntu-debian">Install on Ubuntu / Debian

# Add repository
wget -qO- https://packages.opentelemetry.io/deb/packages/gpg-key.pub | \
    gpg --dearmor -o /usr/share/keyrings/opentelemetry.gpg

echo "deb [signed-by=/usr/share/keyrings/opentelemetry.gpg] \
    https://packages.opentelemetry.io/deb/packages stable main" \
    > /etc/apt/sources.list.d/opentelemetry.list

apt update
apt install otelcol-contrib

Install via Binary Download

# Download the latest release directly (works on any distro)
OTEL_VERSION="0.117.0"
ARCH=$(uname -m | sed 's/x86_64/amd64/;s/aarch64/arm64/')

wget "https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v${OTEL_VERSION}/otelcol-contrib_${OTEL_VERSION}_linux_${ARCH}.tar.gz"

tar -xzf otelcol-contrib_${OTEL_VERSION}_linux_${ARCH}.tar.gz
install otelcol-contrib /usr/local/bin/

# Create systemd service
cat > /etc/systemd/system/otelcol.service << 'EOF'
[Unit]
Description=OpenTelemetry Collector
After=network.target

[Service]
Type=simple
User=otelcol
ExecStart=/usr/local/bin/otelcol-contrib --config=/etc/otelcol/config.yaml
Restart=on-failure
RestartSec=5s

[Install]
WantedBy=multi-user.target
EOF

useradd -r -s /usr/sbin/nologin otelcol
mkdir -p /etc/otelcol
systemctl daemon-reload
systemctl enable otelcol

Configuring the Collector for Linux Servers

The Collector configuration uses YAML and has four top-level sections: receivers, processors, exporters, and service (which wires them into pipelines).

# /etc/otelcol/config.yaml
# A practical baseline configuration for Linux server monitoring

receivers:
  # Receive metrics from applications using OTLP protocol
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

  # Collect host system metrics
  hostmetrics:
    collection_interval: 30s
    scrapers:
      cpu:
        metrics:
          system.cpu.utilization:
            enabled: true
      memory:
      disk:
      filesystem:
        exclude_mount_points:
          mount_points: ["/dev", "/proc", "/sys", "/run/k3s"]
          match_type: strict
      network:
      load:
      processes:

  # Collect systemd/journald logs
  journald:
    directory: /run/log/journal
    units:
      - sshd
      - nginx
      - postgresql
      - "*.service"
    priority: warning

  # Prometheus scrape (to migrate existing node_exporter setups)
  prometheus:
    config:
      scrape_configs:
        - job_name: 'node-exporter'
          static_configs:
            - targets: ['localhost:9100']

processors:
  # Batch data to reduce network requests
  batch:
    timeout: 10s
    send_batch_size: 1024

  # Add resource attributes (host identification)
  resource:
    attributes:
      - key: service.name
        value: "linux-server"
        action: upsert
      - key: host.name
        from_attribute: host.name
        action: insert

  # Filter out noisy or irrelevant data
  filter/logs:
    error_mode: ignore
    logs:
      exclude:
        match_type: regexp
        bodies:
          - "^health.*check"

  # Add memory limiter to prevent OOM
  memory_limiter:
    check_interval: 1s
    limit_mib: 256
    spike_limit_mib: 64

exporters:
  # Export to Prometheus Remote Write (Mimir, Thanos, Cortex)
  prometheusremotewrite:
    endpoint: "http://mimir.monitoring.svc.cluster.local:9009/api/v1/push"
    tls:
      insecure: true

  # Export to Loki for logs
  loki:
    endpoint: "http://loki.monitoring.svc.cluster.local:3100/loki/api/v1/push"
    labels:
      resource_labels:
        - host.name
        - service.name

  # Export traces to a backend (Jaeger, Tempo, etc.)
  otlp/traces:
    endpoint: "tempo.monitoring.svc.cluster.local:4317"
    tls:
      insecure: true

  # Debug output (useful during configuration)
  debug:
    verbosity: normal

service:
  pipelines:
    metrics:
      receivers: [hostmetrics, otlp, prometheus]
      processors: [memory_limiter, resource, batch]
      exporters: [prometheusremotewrite]
    logs:
      receivers: [journald, otlp]
      processors: [memory_limiter, filter/logs, resource, batch]
      exporters: [loki]
    traces:
      receivers: [otlp]
      processors: [memory_limiter, resource, batch]
      exporters: [otlp/traces]

  telemetry:
    metrics:
      address: 0.0.0.0:8888     # Collector's own metrics
    logs:
      level: warn

Collecting Linux Host Metrics

The hostmetrics receiver replaces node_exporter for most use cases. It collects all standard system metrics and exports them in OTel format, which can be converted to Prometheus format for Grafana dashboards.

# Verify the Collector is collecting host metrics
curl -s http://localhost:8888/metrics | grep otelcol_receiver_accepted_metric_points

# Check what metrics are being produced
curl -s http://localhost:8888/metrics | grep system_cpu

Available Metric Scrapers

hostmetrics:
  scrapers:
    cpu:          # CPU utilization and time by state
    memory:       # Memory usage (used, free, cached, buffers)
    disk:         # Disk I/O operations and bytes
    filesystem:   # Filesystem usage by mount point
    network:      # Network interface packets, bytes, errors
    load:         # System load averages (1, 5, 15 min)
    processes:    # Total process count by state
    process:      # Per-process CPU, memory (resource intensive)
    paging:       # Swap/paging activity
    cpu:
      metrics:
        system.cpu.physical.count:
          enabled: true
        system.cpu.logical.count:
          enabled: true

Collecting System Logs

Journald (systemd logs)

receivers:
  journald:
    directory: /run/log/journal
    all: false           # Collect all units (be careful - very verbose)
    units:               # Specific unit names
      - sshd
      - nginx
      - docker
      - kubelet
    priority: notice    # Minimum priority: emerg, alert, crit, err, warning, notice, info, debug

File-Based Log Collection

receivers:
  filelog:
    include:
      - /var/log/nginx/access.log
      - /var/log/nginx/error.log
      - /var/log/postgresql/*.log
      - /var/log/app/*.log
    exclude:
      - /var/log/**/*.gz
    start_at: end         # Only collect new logs (not historical)
    include_file_path: true
    include_file_name: false
    operators:
      - type: regex_parser
        regex: '^(?P\d{4}/\d{2}/\d{2} \d{2}:\d{2}:\d{2}) \[(?P\w+)\] (?P.*)$'
        timestamp:
          parse_from: attributes.timestamp
          layout: '%Y/%m/%d %H:%M:%S'
        severity:
          parse_from: attributes.level

Sending Data to Monitoring Backends

Grafana Stack (Prometheus + Loki + Tempo)

exporters:
  prometheusremotewrite:
    endpoint: "https://prometheus-prod-01.grafana.net/api/prom/push"
    headers:
      Authorization: "Basic ${GRAFANA_PROMETHEUS_TOKEN}"

  loki:
    endpoint: "https://logs-prod-006.grafana.net/loki/api/v1/push"
    headers:
      Authorization: "Basic ${GRAFANA_LOKI_TOKEN}"

  otlp/tempo:
    endpoint: "tempo-prod-04.grafana.net:443"
    headers:
      authorization: "Basic ${GRAFANA_TEMPO_TOKEN}"
    tls:
      insecure: false

Datadog

exporters:
  datadog:
    api:
      key: "${DD_API_KEY}"
      site: "datadoghq.com"
    metrics:
      histograms:
        mode: distributions
    traces:
      compute_top_level_by_span_kind: true

OpenTelemetry on Kubernetes

# Install the OpenTelemetry Operator
kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/latest/download/opentelemetry-operator.yaml

# Create a DaemonSet Collector (one per node)
cat << 'EOF' | kubectl apply -f -
apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: otel-node-collector
  namespace: monitoring
spec:
  mode: daemonset
  config: |
    receivers:
      hostmetrics:
        collection_interval: 30s
        scrapers:
          cpu:
          memory:
          filesystem:
          network:
    exporters:
      prometheusremotewrite:
        endpoint: "http://mimir.monitoring:9009/api/v1/push"
    service:
      pipelines:
        metrics:
          receivers: [hostmetrics]
          exporters: [prometheusremotewrite]
EOF

Instrumenting Applications

Python Application

pip install opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp

from opentelemetry import trace, metrics
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter

# Configure tracing
trace.set_tracer_provider(TracerProvider())
trace.get_tracer_provider().add_span_processor(
    BatchSpanProcessor(OTLPSpanExporter(endpoint="http://localhost:4317", insecure=True)))

tracer = trace.get_tracer(__name__)

with tracer.start_as_current_span("process-request") as span:
    span.set_attribute("user.id", user_id)
    result = process_request(user_id)
    span.set_attribute("result.status", "success")

Troubleshooting the Collector

# Check Collector status
systemctl status otelcol

# View live logs
journalctl -u otelcol -f

# Check internal metrics (Collector's own health)
curl http://localhost:8888/metrics | grep otelcol_

# Key metrics to watch
# otelcol_receiver_accepted_metric_points - metrics being received
# otelcol_exporter_sent_metric_points - metrics successfully sent
# otelcol_exporter_send_failed_metric_points - export failures
# otelcol_processor_dropped_metric_points - dropped data (memory_limiter or filter)

# Validate configuration without starting
otelcol-contrib validate --config /etc/otelcol/config.yaml

# Test configuration with debug output
otelcol-contrib --config /etc/otelcol/config.yaml --set service.telemetry.logs.level=debug

Common Issues

High memory usage: Enable the memory_limiter processor and tune limit_mib. Use the batch processor to reduce in-flight data volume.

Export failures: Check exporter endpoint connectivity from the server. Verify authentication tokens. Review rate limits on your backend.

Missing metrics: Run with debug verbosity to see what the hostmetrics receiver is producing. Some metrics require specific kernel versions or privileged access.

Conclusion

OpenTelemetry is not just another monitoring agent — it is the framework that ends the proliferation of purpose-specific agents on every Linux server. By deploying the OTel Collector, you get metrics, logs, and traces through a single pipeline with the flexibility to change backends without touching your instrumentation. Start with the hostmetrics and journald receivers to replace your existing node_exporter and log shipper, then gradually instrument applications as you migrate to the OTel SDK. The vendor-neutral foundation means your investment in instrumentation compounds over time regardless of which observability platform you use.

Was this article helpful?