Installing GlusterFS: A Step-by-Step Guide for Debian 11 Users

GlusterFS, also known as the Gluster File System, is an open-source distributed file system developed by RedHat. Designed for scalability, GlusterFS combines several servers into a singular entity, allowing users to connect to a GlusterFS volume seamlessly. Capable of handling petabytes of data, it is known for its ease of installation, maintenance, and scalability.

In this tutorial, you will learn to set up GlusterFS—a distributed, scalable network filesystem—on Debian 11 servers. By the end, you’ll have a GlusterFS volume set to automatically replicate data across multiple servers, ensuring a high-availability file system. Moreover, you’ll explore how to use ‘parted’, a Linux partitioning tool, to configure additional disks on Debian servers. The tutorial wraps up with verifying data replication and high availability on GlusterFS between several Debian servers.

Prerequisites

To complete this guide, ensure you have the following:

Two or three Debian 11 servers
A non-root user with sudo privileges or root access

The setup example uses three Debian 11 servers:

Hostname    IP Address
--------------------------
node1       192.168.5.50
node2       192.168.5.56
node3       192.168.5.57

Once you have these requirements ready, let’s proceed with installing GlusterFS.

Setup Hostname and FQDN

Initially, configure the hostname and FQDN (Fully Qualified Domain Name) for each Debian server. This is essential for GlusterFS operations. You can set the hostname using the ‘hostnamectl‘ command and configure the FQDN via the ‘/etc/hosts‘ file.

Execute these commands on each server to set up the hostname:

# run on node1
sudo hostnamectl set-hostname node1.home.lan

# run on node2
sudo hostnamectl set-hostname node2.home.lan

# run on node3
sudo hostnamectl set-hostname node3.home.lan

Next, edit the ‘/etc/hosts‘ file on each server with your preferred editor, like nano:

sudo nano /etc/hosts

Add these lines to map IP addresses to hostnames:

192.168.5.50  node1.home.lan  node1
192.168.5.56  node2.home.lan  node2
192.168.5.57  node3.home.lan  node3

Press Ctrl+x to exit, y to confirm changes, and ENTER to save.

Verify the FQDN on each server:

hostname -f
cat /etc/hosts

You should get outputs similar to: node1.home.lan on node1, node2.home.lan on node2, and node3.home.lan on node3.

Setting up Disk Partition

For GlusterFS deployment, it is recommended to use a designated drive. In this demonstration, each server utilizes an additional disk ‘/dev/vdb’ for GlusterFS installation. Here, you’ll learn disk setup on Linux using ‘fdisk’ via the terminal.

Begin by verifying available disks with:

sudo fdisk -l

You should see two disks on ‘node1‘: ‘/dev/vda‘ (operating system) and ‘/dev/vdb‘ (unconfigured).

To partition ‘/dev/vdb‘, use:

sudo fdisk /dev/vdb

Create a partition: enter ‘n‘.
Select partition type: ‘p‘ for primary.
Define partition number: ‘1‘.
Accept default first sector: press ENTER.
For the last sector, input ‘+5GB‘.
Apply changes: enter ‘w‘.

You’ll receive confirmation: ‘The partition table has been altered‘.

Now that the partition is set, format it to ext4:

sudo mkfs -t ext4 /dev/vdb1

Output will confirm: ‘/dev/vda1‘ formatted as ext4.

Setup Auto-Mount Partition

To auto-mount the partition ‘/dev/vdb1‘, configure the ‘/etc/fstab‘ file. Create storage directories for GlusterFS data.

Create target directories:

# run on node1
mkdir -p /data/node1

# run on node2
mkdir -p /data/node2

# run on node3
mkdir -p /data/node3

Edit ‘/etc/fstab‘ in nano:

sudo nano /etc/fstab

Add these entries for auto-mount:

# for node1
/dev/vdb1    /data/node1    ext4    defaults    0    1

# for node2
/dev/vdb1    /data/node2    ext4    defaults    0    1

# for node3
/dev/vdb1    /data/node3    ext4    defaults    0    1

Mount and verify configurations:

sudo mount -a

Create ‘brick0’ on the new partition:

# run on node1
mkdir -p /data/node1/brick0

# run on node2
mkdir -p /data/node2/brick0

# run on node3
mkdir -p /data/node3/brick0

Installing GlusterFS Server

Install GlusterFS on all Debian servers for the GlusterFS cluster. Run these commands on servers node1, node2, and node3.

Install necessary dependencies:

sudo apt install gnupg2 apt-transport-https software-properties-common

Fetch and configure the GlusterFS GPG key:

curl https://download.gluster.org/pub/gluster/glusterfs/10/rsa.pub | gpg --dearmor > /usr/share/keyrings/glusterfs-archive-keyring.gpg

Add the GlusterFS repository:

DEBID=$(grep 'VERSION_ID=' /etc/os-release | cut -d '=' -f 2 | tr -d '"')
DEBVER=$(grep 'VERSION=' /etc/os-release | grep -Eo '[a-z]+')
DEBARCH=$(dpkg --print-architecture)

echo "deb [signed-by=/usr/share/keyrings/glusterfs-archive-keyring.gpg] https://download.gluster.org/pub/gluster/glusterfs/LATEST/Debian/${DEBID}/${DEBARCH}/apt ${DEBVER} main" | sudo tee /etc/apt/sources.list.d/gluster.list

Update package index:

sudo apt update

Install the GlusterFS server package:

sudo apt install glusterfs-server

Enable and start the GlusterFS service:

sudo systemctl start glusterd
sudo systemctl enable glusterd

Verify the GlusterFS service is running and enabled:

sudo systemctl status glusterd

You should see ‘active (running)‘ confirming that GlusterFS is operational.

Initializing Storage Pool

Next, initialize the GlusterFS cluster across all three servers (node1, node2, and node3), starting from ‘node1‘. Add nodes ‘node2‘ and ‘node3‘ to the cluster.

Ensure each server is reachable via hostname or FQDN—validate with these commands:

ping node2.home.lan
ping node3.home.lan

From ‘node1’, initiate the GlusterFS cluster with:

sudo gluster peer probe node2.home.lan
sudo gluster peer probe node3.home.lan

Successful outputs will read ‘peer probe: success‘.

On ‘node2’, verify cluster status:

sudo gluster peer status

The output will reflect two connected peers: node1 and node3.

Repeat the check on ‘node3’:

sudo gluster peer status

For a broad view of clusters, run:

sudo gluster pool list

Having established the GlusterFS cluster with three Debian servers, proceed to create a volume and mount it from a client.

Creating Replicated Volume

GlusterFS offers multiple volume types: Distributed, Replicated, Distributed Replicated, Dispersed, and Distributed Dispersed. Refer to the official GlusterFS Documentation for more volume type details.

We’ll create a Replicated volume across three servers to automatically sync data within the storage cluster.

Execute the following to create a new Replicated volume named ‘testVolume‘:

sudo gluster volume create testVolume replica 3 node1.home.lan:/data/node1/brick0 node2.home.lan:/data/node2/brick0 node3.home.lan:/data/node3/brick0

The ‘volume create: testVolume: success‘ output indicates successful volume creation.

Start ‘testVolume‘ to make it operational:

sudo gluster volume start testVolume

Success message: ‘volume start: testVolume: success‘ confirms readiness.

Verify volume details:

sudo gluster volume info

The output will reflect a Replicated volume ‘testVolume‘ across three servers (node1, node2, and node3).

Mount GlusterFS Volume on Client

This step shows you how to mount a GlusterFS volume from a client machine—illustrated here using an Ubuntu/Debian client named ‘client‘. You will mount the ‘testVolume‘ volume and set up auto-mount with ‘/etc/fstab‘.

Edit the ‘/etc/hosts‘ file to map GlusterFS server IPs:

sudo nano /etc/hosts

Add the following server details:

192.168.5.50  node1.home.lan  node1
192.168.5.56  node2.home.lan  node2
192.168.5.57  node3.home.lan  node3

Save and exit.

Install GlusterFS client package:

sudo apt install glusterfs-client

Post-installation, create the directory ‘/data‘ for volume mounting:

mkdir /data

Mount ‘testVolume‘ to ‘/data‘ using:

sudo mount.glusterfs node1.home.lan:/testVolume /data

Confirm with:

sudo df -h

You should see ‘testVolume‘ mounted to ‘/data‘.

Configure auto-mount in ‘/etc/fstab‘ to ensure persistence after reboot:

sudo nano /etc/fstab

Add:

node1.home.lan:/testVolume /data glusterfs defaults,_netdev 0 0

Save and exit.

Test Replication and High-Availability

On the client, create files ‘1-15.md‘ in ‘/data’:

cd /data
touch file{1..15}.md

Verify the files:

ls

Check replication on ‘node1‘ in ‘/data/node1/brick0‘:

cd /data/node1/brick0
ls

The files ‘1-15.md‘ should be visible.

Repeat checks on ‘node2‘ and ‘node3‘:

To test high availability, shut down ‘node1‘ and verify client connectivity:

sudo poweroff

On ‘node2‘, check cluster status:

sudo gluster peer status

‘node1‘ will show ‘Disconnected‘.

From the client, confirm connectivity remains intact:

cd /data
ls

High-availability verification confirms system robustness.

Conclusion

In this tutorial, you’ve set up a GlusterFS cluster with three Debian 11 servers. From configuring disk partitions on Linux using fdisk to auto-mounting via ‘/etc/fstab’, this guide covers essential steps to effectively create a Replicated GlusterFS volume. Additionally, you’ve mounted a GlusterFS volume from a client machine.

Enhance your GlusterFS setup by adding more disks and servers for a robust, high-availability networked filesystem. For more advanced administration, explore the GlusterFS official documentation.

FAQ

What is GlusterFS?: GlusterFS is an open-source distributed file system capable of handling petabytes of data, providing an easy-to-install, maintain, and scalable storage solution.
What are the prerequisites for setting up GlusterFS on Debian 11?: You need two or three Debian 11 servers and a non-root user with sudo privileges.
How do I ensure high availability with GlusterFS?: High availability is ensured by configuring a Replicated volume across multiple servers, enabling data replication and uninterrupted access even when a server is down.
Can I add more servers to the existing GlusterFS cluster?: Yes, you can expand the GlusterFS cluster by adding more servers to enhance redundancy and storage capacity.