MariaDB Galera Cluster High-Availability Setup on AlmaLinux 9 in 2026
If your business cannot afford to lose a single write or tolerate even a few minutes of database downtime, a single MariaDB instance is a time bomb. MariaDB Galera Cluster gives you synchronous multi-master replication across three or more nodes: every write lands on every node before the client sees success, any node can accept writes, and failure of any single node is invisible to the application. In 2026 MariaDB 11.4 LTS with Galera 4 has become the default pick for teams who want HA without paying for Oracle RAC or depending on managed cloud databases. This guide walks through a full three-node Galera cluster installation on AlmaLinux 9.
## How Galera Works
Galera is a wsrep-based replication plugin that ships with MariaDB. Each transaction is certification-based: the transaction commits locally, is broadcast to all nodes, and each node independently certifies whether it conflicts with concurrent writes. If any node rejects, the transaction rolls back on all nodes. The result is true synchronous replication β no replica lag, no split brain if you run at least three nodes.
The catch: writes scale with the slowest node because every transaction waits for quorum certification. Galera is for availability, not write throughput.
## Prerequisites
Three AlmaLinux 9 hosts, each with 2+ vCPU and 4+ GB RAM, on the same low-latency network. Assign static IPs. This guide uses 10.0.0.11, 10.0.0.12, and 10.0.0.13. Synchronize time with chronyd β clock skew kills Galera:
“`bash
sudo dnf install -y chrony
sudo systemctl enable –now chronyd
“`
Open firewall ports 3306, 4567, 4568, and 4444 between all three nodes:
“`bash
sudo firewall-cmd –permanent –add-port=3306/tcp
sudo firewall-cmd –permanent –add-port=4567/tcp
sudo firewall-cmd –permanent –add-port=4568/tcp
sudo firewall-cmd –permanent –add-port=4444/tcp
sudo firewall-cmd –permanent –add-port=4567/udp
sudo firewall-cmd –reload
“`
Disable SELinux enforcing mode for the initial cluster bootstrap, or install the mariadb-server-galera SELinux module which does the right thing.
## Installing MariaDB 11.4
Add the MariaDB repo for 11.4:
“`bash
sudo tee /etc/yum.repos.d/mariadb.repo <
## Tuning for Low Latency
If your three nodes share a rack or datacenter, keep `wsrep_slave_threads` at 8β16 and `innodb_flush_log_at_trx_commit=2` for a good balance of durability and latency. For cross-AZ clusters, measure round-trip time and do not attempt synchronous replication over more than 10 ms latency.
## FAQ
**Can I run Galera on two nodes?** Technically yes, but any node failure will cause the survivor to go read-only. Three nodes is the minimum for real HA.
**Is Percona XtraDB Cluster different?** Same Galera engine under a different brand. Configuration is interchangeable.
**Does Galera support foreign keys and triggers?** Yes, with some caveats around DDL statements and non-deterministic triggers.
**What is the write performance hit?** Typical is 10β20% vs a standalone node. High-latency networks amplify this.
**Can I mix Galera versions across nodes?** Briefly during upgrades, yes. Not long term.
**Does Galera support row-based logical backups?** Yes β `mysqldump`, `mariadb-dump`, and mydumper all work normally. For consistent multi-table dumps use `–single-transaction` against any node.
**What is the right hardware for a Galera node?** Identical specs across all nodes, fast NVMe storage, 10 Gbps networking minimum, and at least 16 GB RAM for any production workload. Mixed-spec clusters slow down to the weakest node.
**Can I do a rolling upgrade across major versions?** Within the same major series (11.4.x to 11.4.y), yes. Across major versions (10.11 to 11.4), do a rolling upgrade with one node at a time, validating cluster health after each, and never run mixed for more than the upgrade window.
## Split Brain Prevention with Arbitrators
A two-node Galera cluster cannot survive a network partition because neither side has quorum. The galera arbitrator (`garbd`) is a lightweight daemon that participates in the quorum vote without storing data. Run it on a third small VM and a two-node “real” cluster behaves like a three-node one for quorum purposes:
“`bash
sudo dnf install -y galera-4
garbd –address gcomm://10.0.0.11,10.0.0.12 –group acme-prod –daemon
“`
This is the cheapest way to get HA on a budget that does not stretch to three full database servers.
## Schema Changes Without Downtime
DDL is the painful part of multi-master replication. By default Galera serializes ALTER TABLE across the cluster as a Total Order Isolation (TOI) operation, blocking writes everywhere for the duration. For large tables this is a multi-hour outage in disguise. Use Rolling Schema Upgrade (RSU) mode to apply DDL one node at a time:
“`sql
SET SESSION wsrep_OSU_method = ‘RSU’;
ALTER TABLE orders ADD COLUMN region VARCHAR(16);
SET SESSION wsrep_OSU_method = ‘TOI’;
“`
The node temporarily desyncs, applies the DDL locally, then rejoins. Repeat on each node. Combine with `pt-online-schema-change` from Percona Toolkit for the most demanding migrations β it builds a shadow table, copies data in chunks, and swaps atomically.
## Read Scaling and Connection Routing
Galera’s synchronous replication means every node has identical data. Send write traffic to a single node to minimize certification conflicts and load-balance reads across all three. Application drivers like ProxySQL handle this natively with query routing rules:
“`sql
INSERT INTO mysql_query_rules (rule_id, active, match_pattern, destination_hostgroup, apply)
VALUES (1, 1, ‘^SELECT’, 20, 1), (2, 1, ‘^.*’, 10, 1);
“`
Hostgroup 10 is the writer pool with one node; hostgroup 20 is the reader pool with all three. ProxySQL handles failover automatically when the writer dies.
## Encryption at Rest and in Transit
Modern compliance regimes require both. For replication encryption, configure SSL on each node:
“`ini
[mysqld]
ssl-ca = /etc/mysql/ssl/ca.pem
ssl-cert = /etc/mysql/ssl/server-cert.pem
ssl-key = /etc/mysql/ssl/server-key.pem
require_secure_transport = ON
wsrep_provider_options = “socket.ssl=yes;socket.ssl_key=/etc/mysql/ssl/server-key.pem;socket.ssl_cert=/etc/mysql/ssl/server-cert.pem;socket.ssl_ca=/etc/mysql/ssl/ca.pem”
“`
For data at rest, enable InnoDB tablespace encryption with a keyring plugin. The `file_key_management` plugin works with a local key file for testing; for production, use HashiCorp Vault or AWS KMS via the corresponding plugin.
## Operational Runbooks
Have written, tested procedures for the routine emergencies: the entire cluster going down, the writer node failing under load, a SST taking longer than expected, and a node refusing to rejoin after a crash. The most common production incident is “the cluster lost quorum at 3 a.m.” The recovery is `galera_new_cluster` on the most up-to-date node β but identifying which node has the highest `seqno` from the `grastate.dat` file is the step everyone forgets at 3 a.m. Document it, test it on a staging cluster monthly, and your real failures become routine.
## Performance Benchmarks
A baseline three-node Galera cluster on modern hardware (8 vCPU, 32 GB RAM, NVMe, 10 Gbps network) handles roughly 5,000β8,000 transactions per second with a typical OLTP mix. Read scaling is near-linear across nodes when read queries are distributed. Write throughput is bounded by the slowest node and the certification round-trip latency, so a cluster spanning two data centers 5 ms apart will commit substantially slower than one in a single rack. Benchmark with sysbench under your real workload before committing to a topology.
Was this article helpful?
About Ramesh Sundararamaiah
Red Hat Certified Architect
Expert in Linux system administration, DevOps automation, and cloud infrastructure. Specializing in Red Hat Enterprise Linux, CentOS, Ubuntu, Docker, Ansible, and enterprise IT solutions.