Press ESC to close Press / to search

Fluent Bit on Linux: Complete Log Collection, Parsing, and Forwarding Guide

🎯 Key Takeaways

  • Table of Contents
  • Why Fluent Bit for Log Collection
  • Fluent Bit Architecture: Inputs, Filters, Parsers, Outputs
  • Installing Fluent Bit on Linux
  • Core Configuration: fluent-bit.conf and parsers.conf

πŸ“‘ Table of Contents

Fluent Bit has become the de facto lightweight log processor and forwarder for modern Linux infrastructure and Kubernetes environments. Written in C, it consumes a fraction of the memory that heavier alternatives require while still supporting dozens of inputs, filters, and output destinations. Whether you are shipping systemd journal entries to Elasticsearch, tailing application logs into Loki, or buffering records to S3, Fluent Bit handles the entire pipeline in a single, statically-linkable binary. This guide walks through deploying a production-grade Fluent Bit log pipeline on Linux β€” from installation through parsing, routing, and forwarding to multiple backends.

Table of Contents

Why Fluent Bit for Log Collection

The log shipping landscape includes several mature tools. Fluentd is Fluent Bit’s heavier sibling, written in Ruby with a large plugin ecosystem β€” excellent for complex routing logic but consuming 30–60 MB of RSS at idle. Logstash requires a JVM and excels at complex Grok parsing pipelines but is a poor fit for edge nodes or containers with tight memory limits. Filebeat is Elastic’s lightweight alternative but has limited output destinations outside the Elastic stack and minimal transformation capability without an intermediate Logstash stage.

Fluent Bit typically idles at under 5 MB of RSS and under 1% CPU on a quiet node. It ships with native output plugins for Elasticsearch, OpenSearch, Loki, S3, CloudWatch, Kafka, Splunk, and more β€” eliminating the need for an intermediate aggregator in most pipelines. Its C core also makes it trivial to embed in constrained environments like IoT gateways and minimal container images.

Fluent Bit Architecture: Inputs, Filters, Parsers, Outputs

Fluent Bit processes log data through a linear pipeline of four conceptual stages:

  • Inputs collect data from sources: tail (file tailing), systemd (journald), forward (Fluentd protocol), tcp, syslog. Each input assigns a tag to every record used for routing.
  • Parsers decode raw text into structured key-value records using JSON, regex, LTSV, logfmt, or Lua functions.
  • Filters transform, enrich, or drop records in transit: record_modifier, grep, parser, modify, lua.
  • Outputs send processed records to destinations, with optional disk buffering to handle back-pressure and network failures.

Installing Fluent Bit on Linux

RHEL 9 / Rocky Linux 9

curl -fsSL https://packages.fluentbit.io/fluentbit.key | \
  gpg --dearmor -o /usr/share/keyrings/fluentbit.gpg

cat > /etc/yum.repos.d/fluent-bit.repo << 'EOF'
[fluent-bit]
name=Fluent Bit
baseurl=https://packages.fluentbit.io/centos/9/
gpgcheck=1
gpgkey=https://packages.fluentbit.io/fluentbit.key
enabled=1
EOF

dnf install -y fluent-bit
systemctl enable --now fluent-bit

Ubuntu 22.04 / 24.04

curl -fsSL https://packages.fluentbit.io/fluentbit.key | \
  gpg --dearmor -o /usr/share/keyrings/fluentbit.gpg

echo "deb [signed-by=/usr/share/keyrings/fluentbit.gpg] \
  https://packages.fluentbit.io/ubuntu/jammy jammy main" \
  | tee /etc/apt/sources.list.d/fluent-bit.list

apt update && apt install -y fluent-bit
systemctl enable --now fluent-bit

The binary installs to /opt/fluent-bit/bin/fluent-bit; configuration lives in /etc/fluent-bit/.

Core Configuration: fluent-bit.conf and parsers.conf

The primary configuration file uses an INI-style block syntax. Each block is introduced by a section header in square brackets.

# /etc/fluent-bit/fluent-bit.conf

[SERVICE]
    Flush           5
    Daemon          Off
    Log_Level       info
    Parsers_File    parsers.conf
    HTTP_Server     On
    HTTP_Listen     0.0.0.0
    HTTP_Port       2020
    storage.path    /var/log/fluent-bit/buffer/
    storage.sync    normal
    storage.backlog.mem_limit 50M

The storage.path directory enables disk-based buffering β€” when an output destination is unavailable, records queue to disk rather than being dropped. Set storage.backlog.mem_limit to cap memory used when replaying buffered data.

Collecting Systemd/journald Logs

[INPUT]
    Name              systemd
    Tag               host.systemd
    Systemd_Filter    _SYSTEMD_UNIT=sshd.service
    Systemd_Filter    _SYSTEMD_UNIT=nginx.service
    Read_From_Tail    On
    Strip_Underscores On

[FILTER]
    Name   record_modifier
    Match  host.systemd
    Record hostname ${HOSTNAME}
    Record source   systemd

Strip_Underscores On converts journald field names like _HOSTNAME to HOSTNAME, friendlier for downstream indexers. Read_From_Tail On begins at the current journal position instead of replaying the full history on startup.

Collecting Application Log Files with tail Input

The tail input follows files and tracks byte offsets in a database so restarts do not replay already-processed data.

[INPUT]
    Name              tail
    Tag               app.nginx
    Path              /var/log/nginx/access.log
    Path_Key          filename
    Parser            nginx_combined
    DB                /var/lib/fluent-bit/nginx-tail.db
    Mem_Buf_Limit     10MB
    Skip_Long_Lines   On
    Refresh_Interval  10

[INPUT]
    Name              tail
    Tag               app.myapp
    Path              /var/log/myapp/*.log
    Multiline.parser  multiline-java
    DB                /var/lib/fluent-bit/myapp-tail.db
    Mem_Buf_Limit     20MB

The DB parameter is critical in production β€” without it a service restart replays the entire file from the beginning. Mem_Buf_Limit caps in-memory buffering before Fluent Bit begins pausing the input.

Parsing and Filtering Logs

Regex and JSON Parsers in parsers.conf

[PARSER]
    Name        nginx_combined
    Format      regex
    Regex       ^(?<remote>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)
    Time_Key    time
    Time_Format %d/%b/%Y:%H:%M:%S %z

[PARSER]
    Name        json_app
    Format      json
    Time_Key    timestamp
    Time_Format %Y-%m-%dT%H:%M:%S.%LZ

Grep Filter: Drop Noisy Log Lines

[FILTER]
    Name    grep
    Match   app.nginx
    Exclude path /health
    Exclude path /metrics
    Exclude code 2[0-9][0-9]    # Drop 2xx success codes to reduce volume

Lua Filter for Custom Transformation

# /etc/fluent-bit/scripts/enrich.lua
function enrich_record(tag, timestamp, record)
    if record["level"] == nil then
        record["level"] = "unknown"
    end
    record["level"] = string.upper(record["level"])
    record["processed_at"] = os.date("!%Y-%m-%dT%H:%M:%SZ")
    return 1, timestamp, record
end
[FILTER]
    Name   lua
    Match  app.*
    script /etc/fluent-bit/scripts/enrich.lua
    call   enrich_record

Forwarding to Elasticsearch and OpenSearch

[OUTPUT]
    Name                es
    Match               app.*
    Host                elasticsearch.internal
    Port                9200
    Index               logs-app
    HTTP_User           fluent-bit
    HTTP_Passwd         ${ES_PASSWORD}
    tls                 On
    tls.verify          On
    Suppress_Type_Name  On       # Required for ES 8.x and OpenSearch 2.x
    Retry_Limit         5
    storage.total_limit_size 500M

The same es plugin works with OpenSearch. Suppress_Type_Name On is required for both OpenSearch 2.x and Elasticsearch 8.x, which removed mapping types. storage.total_limit_size caps on-disk buffering during extended outages.

Forwarding to Loki (Grafana Log Aggregation)

Fluent Bit ships a native Loki output plugin. Labels become Loki stream selectors β€” choose them carefully to avoid high-cardinality index explosions.

[OUTPUT]
    Name          loki
    Match         app.*
    Host          loki.monitoring.svc.cluster.local
    Port          3100
    Labels        job=fluent-bit, env=production, host=${HOSTNAME}
    Label_Keys    $filename,$level
    Remove_Keys   filename,level
    Line_Format   json
    Retry_Limit   5

Setting Line_Format json preserves structured fields inside the log line, enabling LogQL's | json pipeline queries in Grafana. Use Label_Keys sparingly β€” each unique label combination creates a separate stream in Loki's index.

Forwarding to S3 and CloudWatch

Amazon S3 for Log Archival

[OUTPUT]
    Name                         s3
    Match                        host.*
    bucket                       my-log-archive-bucket
    region                       us-east-1
    s3_key_format                /logs/%Y/%m/%d/%H/$TAG[1]_%UUID.gz
    s3_key_format_tag_delimiters .
    total_file_size              100M
    upload_timeout               10m
    compression                  gzip
    store_dir                    /var/lib/fluent-bit/s3-buffer

Amazon CloudWatch Logs

[OUTPUT]
    Name              cloudwatch_logs
    Match             host.systemd
    region            us-east-1
    log_group_name    /linux/systemd
    log_stream_prefix host-
    log_stream_template $TAG
    auto_create_group On
    retry_limit       5

Both AWS plugins use the standard credential chain: environment variables, EC2 instance profile, or ECS task role. On EC2, no explicit credentials are needed when the instance role has logs:CreateLogStream and logs:PutLogEvents permissions.

Kubernetes DaemonSet Deployment

helm repo add fluent https://fluent.github.io/helm-charts
helm repo update

helm upgrade --install fluent-bit fluent/fluent-bit \
  --namespace monitoring \
  --create-namespace \
  --set tolerations[0].operator=Exists \
  --set resources.requests.memory=64Mi \
  --set resources.limits.memory=256Mi \
  -f fluent-bit-values.yaml
# fluent-bit-values.yaml β€” production Kubernetes configuration
config:
  inputs: |
    [INPUT]
        Name              tail
        Tag               kube.*
        Path              /var/log/containers/*.log
        Parser            cri
        DB                /var/log/flb_kube.db
        Mem_Buf_Limit     50MB
        Skip_Long_Lines   On
        Refresh_Interval  10

  filters: |
    [FILTER]
        Name                kubernetes
        Match               kube.*
        Kube_URL            https://kubernetes.default.svc:443
        Kube_CA_File        /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        Kube_Token_File     /var/run/secrets/kubernetes.io/serviceaccount/token
        Merge_Log           On
        Keep_Log            Off
        K8S-Logging.Parser  On
        K8S-Logging.Exclude On

  outputs: |
    [OUTPUT]
        Name  loki
        Match *
        Host  loki.monitoring.svc.cluster.local
        Port  3100
        Labels job=fluent-bit
        Auto_Kubernetes_Labels On

Monitoring Fluent Bit with its Built-in HTTP Server

When HTTP_Server On is set in [SERVICE], Fluent Bit exposes a metrics endpoint Prometheus can scrape.

# Check metrics in JSON format
curl -s http://localhost:2020/api/v1/metrics | jq .

# Prometheus-format metrics
curl -s http://localhost:2020/api/v1/metrics/prometheus

# Key metrics to monitor:
# fluentbit_input_records_total      β€” records ingested per input
# fluentbit_output_errors_total      β€” delivery errors per output
# fluentbit_output_retries_failed_total β€” permanently dropped records (alert on this!)

Troubleshooting and Debug Mode

# Run interactively with verbose debug logging
/opt/fluent-bit/bin/fluent-bit -c /etc/fluent-bit/fluent-bit.conf -v

# Test configuration syntax without running
/opt/fluent-bit/bin/fluent-bit -c /etc/fluent-bit/fluent-bit.conf --dry-run

# Temporarily enable debug via systemd drop-in
mkdir -p /etc/systemd/system/fluent-bit.service.d
cat > /etc/systemd/system/fluent-bit.service.d/debug.conf << 'EOF'
[Service]
ExecStart=
ExecStart=/opt/fluent-bit/bin/fluent-bit -c /etc/fluent-bit/fluent-bit.conf -v
EOF
systemctl daemon-reload && systemctl restart fluent-bit
journalctl -fu fluent-bit

Common issues: if records reach an input but never appear at the output, verify that the Match pattern aligns with the Tag assigned by the input β€” tags are case-sensitive and support glob wildcards (app.* matches app.nginx but not appserver). If an output retries repeatedly, check that the storage.path directory exists and is writable by the fluent-bit user, and confirm the destination service is reachable on its expected port.

Conclusion

Fluent Bit delivers a compelling combination of low resource overhead, broad plugin support, and production reliability that makes it the right choice for log collection on both bare-metal Linux servers and Kubernetes clusters. Using the tail input with a persistent database, applying targeted parsers to extract structured fields, enabling disk-based buffering on outputs, and alerting on the retries_failed_total metric gives you a log pipeline that handles network interruptions and service restarts without data loss. From a single node shipping systemd journal entries to a hundred-node Kubernetes cluster forwarding pod logs to Loki and S3, Fluent Bit scales gracefully without the operational weight of heavier alternatives.

Was this article helpful?

Advertisement
🏷️ Tags: CloudWatch Elasticsearch fluent bit journald Kubernetes logging log aggregation log collection log forwarding log pipeline loki s3
R

About Ramesh Sundararamaiah

Red Hat Certified Architect

Expert in Linux system administration, DevOps automation, and cloud infrastructure. Specializing in Red Hat Enterprise Linux, CentOS, Ubuntu, Docker, Ansible, and enterprise IT solutions.

🐧 Stay Updated with Linux Tips

Get the latest tutorials, news, and guides delivered to your inbox weekly.

Advertisement

Add Comment


↑