OpenTelemetry has become the standard for observability instrumentation in modern infrastructure. If you are still running separate agents for metrics, logs, and traces — one for Prometheus, another for your logging backend, a third for distributed tracing — OpenTelemetry replaces that entire stack with a single, vendor-neutral framework. This guide covers deploying and using OpenTelemetry on Linux servers for practical infrastructure observability.
📑 Table of Contents
- Table of Contents
- What Is OpenTelemetry
- Core Components and Architecture
- The Three Pillars
- The OpenTelemetry Collector
- Installing the OpenTelemetry Collector
- Install on RHEL / Rocky Linux / Fedora
- Install on Ubuntu / Debian
- Install via Binary Download
- Configuring the Collector for Linux Servers
- Collecting Linux Host Metrics
- Available Metric Scrapers
- Collecting System Logs
- Journald (systemd logs)
- File-Based Log Collection
- Sending Data to Monitoring Backends
- Grafana Stack (Prometheus + Loki + Tempo)
- Datadog
- OpenTelemetry on Kubernetes
- Instrumenting Applications
- Python Application
- Troubleshooting the Collector
- Common Issues
- Conclusion
Table of Contents
- What Is OpenTelemetry
- Core Components and Architecture
- Installing the OpenTelemetry Collector
- Configuring the Collector for Linux Servers
- Collecting Linux Host Metrics
- Collecting System Logs
- Sending Data to Monitoring Backends
- OpenTelemetry on Kubernetes
- Instrumenting Applications
- Troubleshooting the Collector
What Is OpenTelemetry
OpenTelemetry (OTel) is a CNCF project that provides a standardized framework for generating, collecting, and exporting telemetry data — specifically metrics, logs, and traces. It merges the older OpenCensus and OpenTracing projects and is now the industry standard, supported by every major observability vendor including Grafana, Datadog, Dynatrace, New Relic, AWS CloudWatch, Azure Monitor, and Honeycomb.
The key benefit is vendor neutrality. You instrument your application once using the OpenTelemetry SDK, and you can send that data to any backend. Switching from Grafana Cloud to Datadog becomes a configuration change rather than a re-instrumentation project. For infrastructure-level data, the OpenTelemetry Collector can replace node_exporter, Filebeat, and Jaeger agents with a single daemon.
Core Components and Architecture
The Three Pillars
- Metrics — Numeric measurements over time (CPU usage, request rate, memory bytes). OpenTelemetry metrics are compatible with Prometheus format and can replace node_exporter for host metrics.
- Logs — Timestamped text records from applications and the operating system. OTel collects, processes, and routes logs from files, journald, and application stdout.
- Traces — Distributed request tracking across microservices. Traces show the path of a request through your system with timing for each hop.
The OpenTelemetry Collector
The Collector is the central piece for infrastructure deployments. It is a standalone binary that runs as a system service, receives telemetry from multiple sources (called receivers), processes and enriches it (processors), and exports to multiple backends (exporters).
# Conceptual pipeline:
[Receivers] → [Processors] → [Exporters]
# Example flow:
hostmetricsreceiver → batchprocessor → prometheusremotewriteexporter
journaldreceiver → filterprocessor → lokiexporter
otlpreceiver → resourceprocessor → otlpexporter (to Jaeger)
Installing the OpenTelemetry Collector
rhel-rocky-linux-fedora">Install on RHEL / Rocky Linux / Fedora
# Add the OpenTelemetry repository
cat > /etc/yum.repos.d/opentelemetry.repo << 'EOF'
[opentelemetry]
name=OpenTelemetry Repository
baseurl=https://packages.opentelemetry.io/rpm/packages/
enabled=1
gpgcheck=1
gpgkey=https://packages.opentelemetry.io/rpm/packages/gpg-key.pub
EOF
# Install the Collector Contrib distribution (includes all receivers/exporters)
dnf install otelcol-contrib
# Or install the core distribution (smaller, fewer components)
dnf install otelcol
ubuntu-debian">Install on Ubuntu / Debian
# Add repository
wget -qO- https://packages.opentelemetry.io/deb/packages/gpg-key.pub | \
gpg --dearmor -o /usr/share/keyrings/opentelemetry.gpg
echo "deb [signed-by=/usr/share/keyrings/opentelemetry.gpg] \
https://packages.opentelemetry.io/deb/packages stable main" \
> /etc/apt/sources.list.d/opentelemetry.list
apt update
apt install otelcol-contrib
Install via Binary Download
# Download the latest release directly (works on any distro)
OTEL_VERSION="0.117.0"
ARCH=$(uname -m | sed 's/x86_64/amd64/;s/aarch64/arm64/')
wget "https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v${OTEL_VERSION}/otelcol-contrib_${OTEL_VERSION}_linux_${ARCH}.tar.gz"
tar -xzf otelcol-contrib_${OTEL_VERSION}_linux_${ARCH}.tar.gz
install otelcol-contrib /usr/local/bin/
# Create systemd service
cat > /etc/systemd/system/otelcol.service << 'EOF'
[Unit]
Description=OpenTelemetry Collector
After=network.target
[Service]
Type=simple
User=otelcol
ExecStart=/usr/local/bin/otelcol-contrib --config=/etc/otelcol/config.yaml
Restart=on-failure
RestartSec=5s
[Install]
WantedBy=multi-user.target
EOF
useradd -r -s /usr/sbin/nologin otelcol
mkdir -p /etc/otelcol
systemctl daemon-reload
systemctl enable otelcol
Configuring the Collector for Linux Servers
The Collector configuration uses YAML and has four top-level sections: receivers, processors, exporters, and service (which wires them into pipelines).
# /etc/otelcol/config.yaml
# A practical baseline configuration for Linux server monitoring
receivers:
# Receive metrics from applications using OTLP protocol
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
# Collect host system metrics
hostmetrics:
collection_interval: 30s
scrapers:
cpu:
metrics:
system.cpu.utilization:
enabled: true
memory:
disk:
filesystem:
exclude_mount_points:
mount_points: ["/dev", "/proc", "/sys", "/run/k3s"]
match_type: strict
network:
load:
processes:
# Collect systemd/journald logs
journald:
directory: /run/log/journal
units:
- sshd
- nginx
- postgresql
- "*.service"
priority: warning
# Prometheus scrape (to migrate existing node_exporter setups)
prometheus:
config:
scrape_configs:
- job_name: 'node-exporter'
static_configs:
- targets: ['localhost:9100']
processors:
# Batch data to reduce network requests
batch:
timeout: 10s
send_batch_size: 1024
# Add resource attributes (host identification)
resource:
attributes:
- key: service.name
value: "linux-server"
action: upsert
- key: host.name
from_attribute: host.name
action: insert
# Filter out noisy or irrelevant data
filter/logs:
error_mode: ignore
logs:
exclude:
match_type: regexp
bodies:
- "^health.*check"
# Add memory limiter to prevent OOM
memory_limiter:
check_interval: 1s
limit_mib: 256
spike_limit_mib: 64
exporters:
# Export to Prometheus Remote Write (Mimir, Thanos, Cortex)
prometheusremotewrite:
endpoint: "http://mimir.monitoring.svc.cluster.local:9009/api/v1/push"
tls:
insecure: true
# Export to Loki for logs
loki:
endpoint: "http://loki.monitoring.svc.cluster.local:3100/loki/api/v1/push"
labels:
resource_labels:
- host.name
- service.name
# Export traces to a backend (Jaeger, Tempo, etc.)
otlp/traces:
endpoint: "tempo.monitoring.svc.cluster.local:4317"
tls:
insecure: true
# Debug output (useful during configuration)
debug:
verbosity: normal
service:
pipelines:
metrics:
receivers: [hostmetrics, otlp, prometheus]
processors: [memory_limiter, resource, batch]
exporters: [prometheusremotewrite]
logs:
receivers: [journald, otlp]
processors: [memory_limiter, filter/logs, resource, batch]
exporters: [loki]
traces:
receivers: [otlp]
processors: [memory_limiter, resource, batch]
exporters: [otlp/traces]
telemetry:
metrics:
address: 0.0.0.0:8888 # Collector's own metrics
logs:
level: warn
Collecting Linux Host Metrics
The hostmetrics receiver replaces node_exporter for most use cases. It collects all standard system metrics and exports them in OTel format, which can be converted to Prometheus format for Grafana dashboards.
# Verify the Collector is collecting host metrics
curl -s http://localhost:8888/metrics | grep otelcol_receiver_accepted_metric_points
# Check what metrics are being produced
curl -s http://localhost:8888/metrics | grep system_cpu
Available Metric Scrapers
hostmetrics:
scrapers:
cpu: # CPU utilization and time by state
memory: # Memory usage (used, free, cached, buffers)
disk: # Disk I/O operations and bytes
filesystem: # Filesystem usage by mount point
network: # Network interface packets, bytes, errors
load: # System load averages (1, 5, 15 min)
processes: # Total process count by state
process: # Per-process CPU, memory (resource intensive)
paging: # Swap/paging activity
cpu:
metrics:
system.cpu.physical.count:
enabled: true
system.cpu.logical.count:
enabled: true
Collecting System Logs
Journald (systemd logs)
receivers:
journald:
directory: /run/log/journal
all: false # Collect all units (be careful - very verbose)
units: # Specific unit names
- sshd
- nginx
- docker
- kubelet
priority: notice # Minimum priority: emerg, alert, crit, err, warning, notice, info, debug
File-Based Log Collection
receivers:
filelog:
include:
- /var/log/nginx/access.log
- /var/log/nginx/error.log
- /var/log/postgresql/*.log
- /var/log/app/*.log
exclude:
- /var/log/**/*.gz
start_at: end # Only collect new logs (not historical)
include_file_path: true
include_file_name: false
operators:
- type: regex_parser
regex: '^(?P\d{4}/\d{2}/\d{2} \d{2}:\d{2}:\d{2}) \[(?P\w+)\] (?P.*)$'
timestamp:
parse_from: attributes.timestamp
layout: '%Y/%m/%d %H:%M:%S'
severity:
parse_from: attributes.level
Sending Data to Monitoring Backends
Grafana Stack (Prometheus + Loki + Tempo)
exporters:
prometheusremotewrite:
endpoint: "https://prometheus-prod-01.grafana.net/api/prom/push"
headers:
Authorization: "Basic ${GRAFANA_PROMETHEUS_TOKEN}"
loki:
endpoint: "https://logs-prod-006.grafana.net/loki/api/v1/push"
headers:
Authorization: "Basic ${GRAFANA_LOKI_TOKEN}"
otlp/tempo:
endpoint: "tempo-prod-04.grafana.net:443"
headers:
authorization: "Basic ${GRAFANA_TEMPO_TOKEN}"
tls:
insecure: false
Datadog
exporters:
datadog:
api:
key: "${DD_API_KEY}"
site: "datadoghq.com"
metrics:
histograms:
mode: distributions
traces:
compute_top_level_by_span_kind: true
OpenTelemetry on Kubernetes
# Install the OpenTelemetry Operator
kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/latest/download/opentelemetry-operator.yaml
# Create a DaemonSet Collector (one per node)
cat << 'EOF' | kubectl apply -f -
apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
name: otel-node-collector
namespace: monitoring
spec:
mode: daemonset
config: |
receivers:
hostmetrics:
collection_interval: 30s
scrapers:
cpu:
memory:
filesystem:
network:
exporters:
prometheusremotewrite:
endpoint: "http://mimir.monitoring:9009/api/v1/push"
service:
pipelines:
metrics:
receivers: [hostmetrics]
exporters: [prometheusremotewrite]
EOF
Instrumenting Applications
Python Application
pip install opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp
from opentelemetry import trace, metrics
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
# Configure tracing
trace.set_tracer_provider(TracerProvider())
trace.get_tracer_provider().add_span_processor(
BatchSpanProcessor(OTLPSpanExporter(endpoint="http://localhost:4317", insecure=True)))
tracer = trace.get_tracer(__name__)
with tracer.start_as_current_span("process-request") as span:
span.set_attribute("user.id", user_id)
result = process_request(user_id)
span.set_attribute("result.status", "success")
Troubleshooting the Collector
# Check Collector status
systemctl status otelcol
# View live logs
journalctl -u otelcol -f
# Check internal metrics (Collector's own health)
curl http://localhost:8888/metrics | grep otelcol_
# Key metrics to watch
# otelcol_receiver_accepted_metric_points - metrics being received
# otelcol_exporter_sent_metric_points - metrics successfully sent
# otelcol_exporter_send_failed_metric_points - export failures
# otelcol_processor_dropped_metric_points - dropped data (memory_limiter or filter)
# Validate configuration without starting
otelcol-contrib validate --config /etc/otelcol/config.yaml
# Test configuration with debug output
otelcol-contrib --config /etc/otelcol/config.yaml --set service.telemetry.logs.level=debug
Common Issues
High memory usage: Enable the memory_limiter processor and tune limit_mib. Use the batch processor to reduce in-flight data volume.
Export failures: Check exporter endpoint connectivity from the server. Verify authentication tokens. Review rate limits on your backend.
Missing metrics: Run with debug verbosity to see what the hostmetrics receiver is producing. Some metrics require specific kernel versions or privileged access.
Conclusion
OpenTelemetry is not just another monitoring agent — it is the framework that ends the proliferation of purpose-specific agents on every Linux server. By deploying the OTel Collector, you get metrics, logs, and traces through a single pipeline with the flexibility to change backends without touching your instrumentation. Start with the hostmetrics and journald receivers to replace your existing node_exporter and log shipper, then gradually instrument applications as you migrate to the OTel SDK. The vendor-neutral foundation means your investment in instrumentation compounds over time regardless of which observability platform you use.
Was this article helpful?