🏡 Homelab II: Proxmox cluster, ZFS and NFS

In the previous part of this series, I assembled (and modified) hardware and setup the base operating systems on the machines. In this part, I’ll go over how to connect the Proxmox nodes together, add a quorum device and provision some storage with ZFS. Also one of the ZFS drives will be configured with NFS to share container templates, ISOs, and snippets.

# Cluster

If you only have one machine in your homelab, you can skip this step.

To create a cluster, pick one node to initialize it on:

root@r720$ pvecm create rob-lab

And now it’s a one node Proxmox cluster.

Then on the second node to add, join via the first node’s IP address:

root@nuc$ pvecm add 192.168.1.100

Now it’s a two-node cluster:

root@nuc$ pvecm status
Cluster information
-------------------
Name:             rob-lab
Config Version:   2
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Thu Dec 30 20:22:25 2021
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          0x00000002
Ring ID:          1.9
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   2
Highest expected: 2
Total votes:      2
Quorum:           2
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 192.168.1.100
0x00000002          1 192.168.1.200 (local)

Under the hood, Proxmox uses the corosync cluster engine that uses a voting system with each node in the cluster. In an ideal scenario, there would be an odd number of nodes, but since I only have two machines I’m going to setup the Raspberry Pi as a voter so that the cluster can properly reach quorum. The Pi is going to be configured as a corosync qdevice:

pi@piprimary$ sudo apt install corosync-qnetd corosync-qdevice

Unfortunately the qdevice setup will require password auth via SSH to the root user. So the Pi’s SSH configuration will temporarily be changed to allow root login via SSH and a root password must be set:

pi@piprimary$ sudo su -
root@piprimary$ passwd
New password:
Retype new password:
passwd: password updated successfully
root@piprimary$ vi /etc/ssh/sshd_config # Set PermitRootLogin to "yes"
root@piprimary$ systemctl restart sshd

The qdevice package needs to be installed on each of the Proxmox nodes as well:

root@r720$ apt install corosync-qdevice

root@nuc$ apt install corosync-qdevice

Adding the Pi as a qdevice to the cluster is slightly different from adding a normal node. On an already existing cluster node use pvecm qdevice setup to add the Pi by IP:

root@r720$ pvecm qdevice setup 192.168.1.254

Now, it’s a two node cluster but with three members and three expected quorum votes:

root@r720$ pvecm status
Cluster information
-------------------
Name:             rob-lab
Config Version:   3
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Fri Dec 31 12:41:02 2021
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          0x00000001
Ring ID:          1.9
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      3
Quorum:           2
Flags:            Quorate Qdevice

Membership information
----------------------
    Nodeid      Votes    Qdevice Name
0x00000001          1    A,V,NMW 192.168.1.100 (local)
0x00000002          1    A,V,NMW 192.168.1.200
0x00000000          1            Qdevice

Back on the Pi, disable SSH root login:

root@piprimary$ vi /etc/ssh/sshd_config # Set PermitRootLogin to "no"
root@piprimary$ systemctl restart sshd

# Redundancy with ZFS

While the NUC will just be using the single SSD for the host OS and all workload storage, the Poweredge has a few drives that need to be configured with ZFS.

ZFS filesystems are built on virtual storage pools. For now, there will be two pools, ssd-mirror and rusty-z2, as mentioned in the first post in this series. The third pool, wolves-z will be handled later, since the entire controller connecting the drives will be passed through to a VM.

Create a mirrored pool of two drives called ssd-mirror:

root@r720$ zpool create ssd-mirror mirror /dev/sdo /dev/sdq

Create a RAID-z2 pool of 11 drives called rusty-z2 (the /dev/ can be omitted):

root@r720$ zpool create rusty-z2 raidz2 sde sdf sdg sdh sdi sdj sdk sdl sdm sdn sdp

root@r720$ zfs list
NAME         USED  AVAIL     REFER  MOUNTPOINT
rusty-z2    1.10M  7.49T      219K  /rusty-z2
ssd-mirror   528K   899G       96K  /ssd-mirror
root@r720$ zpool status
  pool: rusty-z2
 state: ONLINE
config:

  NAME                           STATE     READ WRITE CKSUM
  rusty-z2                       ONLINE       0     0     0
    raidz2-0                     ONLINE       0     0     0
      ata-ST91000640NS_9XG3QG5J  ONLINE       0     0     0
      ata-ST91000640NS_9XG3WGKZ  ONLINE       0     0     0
      ata-ST91000640NS_9XG3VHK5  ONLINE       0     0     0
      ata-ST91000640NS_9XG3TRW7  ONLINE       0     0     0
      scsi-35000c50083a28083     ONLINE       0     0     0
      scsi-35000c50083a0395b     ONLINE       0     0     0
      ata-ST91000640NS_9XG3WGCA  ONLINE       0     0     0
      ata-ST91000640NS_9XG3V6JB  ONLINE       0     0     0
      ata-ST91000640NS_9XG40C6A  ONLINE       0     0     0
      ata-ST91000640NS_9XG40JQH  ONLINE       0     0     0
      ata-ST91000640NS_9XG3VAEC  ONLINE       0     0     0

errors: No known data errors

  pool: ssd-mirror
 state: ONLINE
config:

  NAME                                  STATE     READ WRITE CKSUM
  ssd-mirror                            ONLINE       0     0     0
    mirror-0                            ONLINE       0     0     0
      ata-CT1000MX500SSD1_2147E5E74EEA  ONLINE       0     0     0
      ata-CT1000MX500SSD1_2147E5E73F89  ONLINE       0     0     0

errors: No known data errors

To make these available to Proxmox, they’ll need to be added manually to /etc/pve/storage.cfg like so:

zfspool: rusty-z2
  pool rusty-z2
  content images,rootdir
  mountpoint /rusty-z2
  nodes r720

zfspool: ssd-mirror
  pool ssd-mirror
  content images,rootdir
  mountpoint /ssd-mirror
  nodes r720

Alternatively, this can be also done in Proxmox’s web console under Node > Disks > ZFS > Create and it will create the zpool and storage entry all together. This is way easier than using the CLI but it’s good to know how this is all happening behind the pretty web console.

Each storage type in Proxmox has restrictions to the type of content it can hold. For instance, the zfspool type can only hold images or rootdir, which are VM disk images and container directories. For the ssd-mirror type this is perfect, because it will be for those exact workloads. For the rusty-z2 pool, we’ll need a different storage type.

To do so, initialize a new ZFS dataset called pve under rusty-z2:

root@r720$ zfs create rusty-z2/pve

Again, to use this in Proxmox it must be added to /etc/pve/storage.cfg:

dir: rusty-dir
  path /rusty-z2/pve
  content backup,snippets,iso,vztmpl
  nodes r720
  prune-backups keep-all=1

Notice the dir type with a content of backup,snippets,iso,vztmpl. Once this is done, all of the storage will appear in the web console under the r720 node:

# ZFS shared over NFS

It would be really convient if the NUC could access the rusty-dir storage, so that it could use that redundant storage for backups and share ISOs, container templates, snippets, etc. With ZFS and NFS this is dead simple.

Add NFS server:

root@r720$ apt install nfs-kernel-server

Set the dataset to share NFS:

root@r720$ zfs set sharenfs='rw' rusty-z2/pve

On the NUC node, add the following to /etc/pve/storage.cfg:

nfs: rusty-nfs
  export /rusty-z2/pve
  path /mnt/pve/rusty-nfs
  server 192.168.1.100
  content backup,snippets,iso,vztmpl
  nodes nuc
  prune-backups keep-all=1

In the Proxmox console, the new NFS storage should appear under the NUC node:

As a quick test, on the r720 node, download a container template:

root@r720$ pveam download rusty-dir ubuntu-20.04-standard_20.04-1_amd64.tar.gz

And on the NUC node, it should appear in the corresponding NFS:

root@nuc$ pveam list rusty-nfs
NAME                                                         SIZE
rusty-nfs:vztmpl/ubuntu-20.04-standard_20.04-1_amd64.tar.gz  204.28MB

Now the NUC can have redundant storage over NFS.

# Next

The machines are running, storage is configured and the cluster is ready for some workloads, but before that it’d be a good idea to automate some of the preflight tasks. In the next part, I’ll take a look at Ansible to harden access and handle any of the post-install configuration.