Fluent Bit on Linux: Complete Log Collection, Parsing, and Forwarding Guide
π― Key Takeaways
- Table of Contents
- Why Fluent Bit for Log Collection
- Fluent Bit Architecture: Inputs, Filters, Parsers, Outputs
- Installing Fluent Bit on Linux
- Core Configuration: fluent-bit.conf and parsers.conf
π Table of Contents
- Table of Contents
- Why Fluent Bit for Log Collection
- Fluent Bit Architecture: Inputs, Filters, Parsers, Outputs
- Installing Fluent Bit on Linux
- Core Configuration: fluent-bit.conf and parsers.conf
- Collecting Systemd/journald Logs
- Collecting Application Log Files with tail Input
- Parsing and Filtering Logs
- Forwarding to Elasticsearch and OpenSearch
- Forwarding to Loki (Grafana Log Aggregation)
- Forwarding to S3 and CloudWatch
- Kubernetes DaemonSet Deployment
- Monitoring Fluent Bit with its Built-in HTTP Server
- Troubleshooting and Debug Mode
- Conclusion
Fluent Bit has become the de facto lightweight log processor and forwarder for modern Linux infrastructure and Kubernetes environments. Written in C, it consumes a fraction of the memory that heavier alternatives require while still supporting dozens of inputs, filters, and output destinations. Whether you are shipping systemd journal entries to Elasticsearch, tailing application logs into Loki, or buffering records to S3, Fluent Bit handles the entire pipeline in a single, statically-linkable binary. This guide walks through deploying a production-grade Fluent Bit log pipeline on Linux β from installation through parsing, routing, and forwarding to multiple backends.
π Table of Contents
- Table of Contents
- Why Fluent Bit for Log Collection
- Fluent Bit Architecture: Inputs, Filters, Parsers, Outputs
- Installing Fluent Bit on Linux
- RHEL 9 / Rocky Linux 9
- Ubuntu 22.04 / 24.04
- Core Configuration: fluent-bit.conf and parsers.conf
- Collecting Systemd/journald Logs
- Collecting Application Log Files with tail Input
- Parsing and Filtering Logs
- Regex and JSON Parsers in parsers.conf
- Grep Filter: Drop Noisy Log Lines
- Lua Filter for Custom Transformation
- Forwarding to Elasticsearch and OpenSearch
- Forwarding to Loki (Grafana Log Aggregation)
- Forwarding to S3 and CloudWatch
- Amazon S3 for Log Archival
- Amazon CloudWatch Logs
- Kubernetes DaemonSet Deployment
- Monitoring Fluent Bit with its Built-in HTTP Server
- Troubleshooting and Debug Mode
- Conclusion
Table of Contents
- Why Fluent Bit for Log Collection
- Fluent Bit Architecture: Inputs, Filters, Parsers, Outputs
- Installing Fluent Bit on Linux
- Core Configuration: fluent-bit.conf and parsers.conf
- Collecting Systemd/journald Logs
- Collecting Application Log Files with tail Input
- Parsing and Filtering Logs
- Forwarding to Elasticsearch and OpenSearch
- Forwarding to Loki (Grafana Log Aggregation)
- Forwarding to S3 and CloudWatch
- Kubernetes DaemonSet Deployment
- Monitoring Fluent Bit with its Built-in HTTP Server
- Troubleshooting and Debug Mode
Why Fluent Bit for Log Collection
The log shipping landscape includes several mature tools. Fluentd is Fluent Bit’s heavier sibling, written in Ruby with a large plugin ecosystem β excellent for complex routing logic but consuming 30β60 MB of RSS at idle. Logstash requires a JVM and excels at complex Grok parsing pipelines but is a poor fit for edge nodes or containers with tight memory limits. Filebeat is Elastic’s lightweight alternative but has limited output destinations outside the Elastic stack and minimal transformation capability without an intermediate Logstash stage.
Fluent Bit typically idles at under 5 MB of RSS and under 1% CPU on a quiet node. It ships with native output plugins for Elasticsearch, OpenSearch, Loki, S3, CloudWatch, Kafka, Splunk, and more β eliminating the need for an intermediate aggregator in most pipelines. Its C core also makes it trivial to embed in constrained environments like IoT gateways and minimal container images.
Fluent Bit Architecture: Inputs, Filters, Parsers, Outputs
Fluent Bit processes log data through a linear pipeline of four conceptual stages:
- Inputs collect data from sources:
tail(file tailing),systemd(journald),forward(Fluentd protocol),tcp,syslog. Each input assigns a tag to every record used for routing. - Parsers decode raw text into structured key-value records using JSON, regex, LTSV, logfmt, or Lua functions.
- Filters transform, enrich, or drop records in transit:
record_modifier,grep,parser,modify,lua. - Outputs send processed records to destinations, with optional disk buffering to handle back-pressure and network failures.
Installing Fluent Bit on Linux
RHEL 9 / Rocky Linux 9
curl -fsSL https://packages.fluentbit.io/fluentbit.key | \
gpg --dearmor -o /usr/share/keyrings/fluentbit.gpg
cat > /etc/yum.repos.d/fluent-bit.repo << 'EOF'
[fluent-bit]
name=Fluent Bit
baseurl=https://packages.fluentbit.io/centos/9/
gpgcheck=1
gpgkey=https://packages.fluentbit.io/fluentbit.key
enabled=1
EOF
dnf install -y fluent-bit
systemctl enable --now fluent-bit
Ubuntu 22.04 / 24.04
curl -fsSL https://packages.fluentbit.io/fluentbit.key | \
gpg --dearmor -o /usr/share/keyrings/fluentbit.gpg
echo "deb [signed-by=/usr/share/keyrings/fluentbit.gpg] \
https://packages.fluentbit.io/ubuntu/jammy jammy main" \
| tee /etc/apt/sources.list.d/fluent-bit.list
apt update && apt install -y fluent-bit
systemctl enable --now fluent-bit
The binary installs to /opt/fluent-bit/bin/fluent-bit; configuration lives in /etc/fluent-bit/.
Core Configuration: fluent-bit.conf and parsers.conf
The primary configuration file uses an INI-style block syntax. Each block is introduced by a section header in square brackets.
# /etc/fluent-bit/fluent-bit.conf
[SERVICE]
Flush 5
Daemon Off
Log_Level info
Parsers_File parsers.conf
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_Port 2020
storage.path /var/log/fluent-bit/buffer/
storage.sync normal
storage.backlog.mem_limit 50M
The storage.path directory enables disk-based buffering β when an output destination is unavailable, records queue to disk rather than being dropped. Set storage.backlog.mem_limit to cap memory used when replaying buffered data.
Collecting Systemd/journald Logs
[INPUT]
Name systemd
Tag host.systemd
Systemd_Filter _SYSTEMD_UNIT=sshd.service
Systemd_Filter _SYSTEMD_UNIT=nginx.service
Read_From_Tail On
Strip_Underscores On
[FILTER]
Name record_modifier
Match host.systemd
Record hostname ${HOSTNAME}
Record source systemd
Strip_Underscores On converts journald field names like _HOSTNAME to HOSTNAME, friendlier for downstream indexers. Read_From_Tail On begins at the current journal position instead of replaying the full history on startup.
Collecting Application Log Files with tail Input
The tail input follows files and tracks byte offsets in a database so restarts do not replay already-processed data.
[INPUT]
Name tail
Tag app.nginx
Path /var/log/nginx/access.log
Path_Key filename
Parser nginx_combined
DB /var/lib/fluent-bit/nginx-tail.db
Mem_Buf_Limit 10MB
Skip_Long_Lines On
Refresh_Interval 10
[INPUT]
Name tail
Tag app.myapp
Path /var/log/myapp/*.log
Multiline.parser multiline-java
DB /var/lib/fluent-bit/myapp-tail.db
Mem_Buf_Limit 20MB
The DB parameter is critical in production β without it a service restart replays the entire file from the beginning. Mem_Buf_Limit caps in-memory buffering before Fluent Bit begins pausing the input.
Parsing and Filtering Logs
Regex and JSON Parsers in parsers.conf
[PARSER]
Name nginx_combined
Format regex
Regex ^(?<remote>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)
Time_Key time
Time_Format %d/%b/%Y:%H:%M:%S %z
[PARSER]
Name json_app
Format json
Time_Key timestamp
Time_Format %Y-%m-%dT%H:%M:%S.%LZ
Grep Filter: Drop Noisy Log Lines
[FILTER]
Name grep
Match app.nginx
Exclude path /health
Exclude path /metrics
Exclude code 2[0-9][0-9] # Drop 2xx success codes to reduce volume
Lua Filter for Custom Transformation
# /etc/fluent-bit/scripts/enrich.lua
function enrich_record(tag, timestamp, record)
if record["level"] == nil then
record["level"] = "unknown"
end
record["level"] = string.upper(record["level"])
record["processed_at"] = os.date("!%Y-%m-%dT%H:%M:%SZ")
return 1, timestamp, record
end
[FILTER]
Name lua
Match app.*
script /etc/fluent-bit/scripts/enrich.lua
call enrich_record
Forwarding to Elasticsearch and OpenSearch
[OUTPUT]
Name es
Match app.*
Host elasticsearch.internal
Port 9200
Index logs-app
HTTP_User fluent-bit
HTTP_Passwd ${ES_PASSWORD}
tls On
tls.verify On
Suppress_Type_Name On # Required for ES 8.x and OpenSearch 2.x
Retry_Limit 5
storage.total_limit_size 500M
The same es plugin works with OpenSearch. Suppress_Type_Name On is required for both OpenSearch 2.x and Elasticsearch 8.x, which removed mapping types. storage.total_limit_size caps on-disk buffering during extended outages.
Forwarding to Loki (Grafana Log Aggregation)
Fluent Bit ships a native Loki output plugin. Labels become Loki stream selectors β choose them carefully to avoid high-cardinality index explosions.
[OUTPUT]
Name loki
Match app.*
Host loki.monitoring.svc.cluster.local
Port 3100
Labels job=fluent-bit, env=production, host=${HOSTNAME}
Label_Keys $filename,$level
Remove_Keys filename,level
Line_Format json
Retry_Limit 5
Setting Line_Format json preserves structured fields inside the log line, enabling LogQL's | json pipeline queries in Grafana. Use Label_Keys sparingly β each unique label combination creates a separate stream in Loki's index.
Forwarding to S3 and CloudWatch
Amazon S3 for Log Archival
[OUTPUT]
Name s3
Match host.*
bucket my-log-archive-bucket
region us-east-1
s3_key_format /logs/%Y/%m/%d/%H/$TAG[1]_%UUID.gz
s3_key_format_tag_delimiters .
total_file_size 100M
upload_timeout 10m
compression gzip
store_dir /var/lib/fluent-bit/s3-buffer
Amazon CloudWatch Logs
[OUTPUT]
Name cloudwatch_logs
Match host.systemd
region us-east-1
log_group_name /linux/systemd
log_stream_prefix host-
log_stream_template $TAG
auto_create_group On
retry_limit 5
Both AWS plugins use the standard credential chain: environment variables, EC2 instance profile, or ECS task role. On EC2, no explicit credentials are needed when the instance role has logs:CreateLogStream and logs:PutLogEvents permissions.
Kubernetes DaemonSet Deployment
helm repo add fluent https://fluent.github.io/helm-charts
helm repo update
helm upgrade --install fluent-bit fluent/fluent-bit \
--namespace monitoring \
--create-namespace \
--set tolerations[0].operator=Exists \
--set resources.requests.memory=64Mi \
--set resources.limits.memory=256Mi \
-f fluent-bit-values.yaml
# fluent-bit-values.yaml β production Kubernetes configuration
config:
inputs: |
[INPUT]
Name tail
Tag kube.*
Path /var/log/containers/*.log
Parser cri
DB /var/log/flb_kube.db
Mem_Buf_Limit 50MB
Skip_Long_Lines On
Refresh_Interval 10
filters: |
[FILTER]
Name kubernetes
Match kube.*
Kube_URL https://kubernetes.default.svc:443
Kube_CA_File /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
Kube_Token_File /var/run/secrets/kubernetes.io/serviceaccount/token
Merge_Log On
Keep_Log Off
K8S-Logging.Parser On
K8S-Logging.Exclude On
outputs: |
[OUTPUT]
Name loki
Match *
Host loki.monitoring.svc.cluster.local
Port 3100
Labels job=fluent-bit
Auto_Kubernetes_Labels On
Monitoring Fluent Bit with its Built-in HTTP Server
When HTTP_Server On is set in [SERVICE], Fluent Bit exposes a metrics endpoint Prometheus can scrape.
# Check metrics in JSON format
curl -s http://localhost:2020/api/v1/metrics | jq .
# Prometheus-format metrics
curl -s http://localhost:2020/api/v1/metrics/prometheus
# Key metrics to monitor:
# fluentbit_input_records_total β records ingested per input
# fluentbit_output_errors_total β delivery errors per output
# fluentbit_output_retries_failed_total β permanently dropped records (alert on this!)
Troubleshooting and Debug Mode
# Run interactively with verbose debug logging
/opt/fluent-bit/bin/fluent-bit -c /etc/fluent-bit/fluent-bit.conf -v
# Test configuration syntax without running
/opt/fluent-bit/bin/fluent-bit -c /etc/fluent-bit/fluent-bit.conf --dry-run
# Temporarily enable debug via systemd drop-in
mkdir -p /etc/systemd/system/fluent-bit.service.d
cat > /etc/systemd/system/fluent-bit.service.d/debug.conf << 'EOF'
[Service]
ExecStart=
ExecStart=/opt/fluent-bit/bin/fluent-bit -c /etc/fluent-bit/fluent-bit.conf -v
EOF
systemctl daemon-reload && systemctl restart fluent-bit
journalctl -fu fluent-bit
Common issues: if records reach an input but never appear at the output, verify that the Match pattern aligns with the Tag assigned by the input β tags are case-sensitive and support glob wildcards (app.* matches app.nginx but not appserver). If an output retries repeatedly, check that the storage.path directory exists and is writable by the fluent-bit user, and confirm the destination service is reachable on its expected port.
Conclusion
Fluent Bit delivers a compelling combination of low resource overhead, broad plugin support, and production reliability that makes it the right choice for log collection on both bare-metal Linux servers and Kubernetes clusters. Using the tail input with a persistent database, applying targeted parsers to extract structured fields, enabling disk-based buffering on outputs, and alerting on the retries_failed_total metric gives you a log pipeline that handles network interruptions and service restarts without data loss. From a single node shipping systemd journal entries to a hundred-node Kubernetes cluster forwarding pod logs to Loki and S3, Fluent Bit scales gracefully without the operational weight of heavier alternatives.
Was this article helpful?
About Ramesh Sundararamaiah
Red Hat Certified Architect
Expert in Linux system administration, DevOps automation, and cloud infrastructure. Specializing in Red Hat Enterprise Linux, CentOS, Ubuntu, Docker, Ansible, and enterprise IT solutions.