Proxmox - Smartest ZFS Pool Replication Process Across Cluster? (lemmy.world)

submitted 23 hours ago* (last edited 21 hours ago) by pr0927@lemmy.world to c/selfhosted@lemmy.world

19 comments fedilink hide all child comments

Hi all, I have been chasing answers for this for months on the Proxmox forums, Reddit, and the LevelOneTechs forums and haven't gotten much guidance, unfortunately. Hoping Lemmy will be the magic solution!

Perhaps I couched my initial thread in too much detail, so after some digging I got to more focused questions for my follow-up (effectively what this thread is), but I still didn't get much of a response!

In all this time, I even have one more random thought I haven't asked elsewhere - if I "Add Storage" of my ZFS pools in Proxmox, even though the categories don't really fit data storage (the categories are like VM data, CT data, ISOs, snippets, etc.), then I could attach these to a VM or CT and replicate them via "normal" Proxmox cluster replication - is it OK to add such data pools to this storage section as such?

Anyway, to the main course - the summary of what I'm seeking help on is below.

Long story short:

3 nodes in a cluster, using ZFS.
- CTs and VMs are replicated across nodes via GUI.
I want to replicate data in ZFS pools (which my CTs and VMs use - CTs through bind-mounts, VMs through NFS shares) to the other nodes.
- Currently using Sanoid/Syncoid to make this happen from one node to two others via cronjob.

So three questions:

If I do Sanoid/Syncoid on all three nodes (to each other) - is this stupid, and will it fail - or will each node recognize snapshots for a ZFS pool and incrementally update if needed (even if the snapshot was made on/by a different node)?
- As a sub-question to this - and kind of the point of my overall thread and the previous one - is this even a sensible way to approach this, or is there a better way?
For the GUI-based replication tasks, since I have CTs replicating to other systems, if I unchecked "skip replication" for the bind-mounted ZFS pools - would this accomplish the same thing? Or would it fail outright? I seem to remember needing to check this for some reason.
Is this PVE-zsync suitable for my situation? I see mention of no recursive syncing, which I don't fully know what that means, and I don't know if that's a dealbreaker. I suppose if this is the correct choice - then I need to delete my current GUI-based CT/VM replication tasks?

For those with immense patience, here was the original thread with more detail:

Hi all, so I setup three Proxmox servers (two identical, one “analogous”) - and the basics about the setup are as follows:

VMs and CTs are replicated every 15 minutes from one node to the other two.

One CT runs Cockpit for SMB shares - bind-mounted to the ZFS pools with the datasets that are SMB-shared.

I use this for accessing folders via GUI over the network from my PC.

One CT runs an NFS server (no special GUI, only CLI) - bind-mounted to the ZFS pools with the datasets that are NFS-shared (same as SMB-shared ones).

Apps that need to tap into data use NFS shares (such as Jellyfin, Immich, Frigate) provided by this CT.

Two VMs are of Debian, running Docker for my apps.

VMs and CTs are all stored on 2x2TB M.2 NVMe SSDs.

Data is stored in folders (per the NFS/SMB shares) on a 4x8TB ZFS pool with specific datasets like Media, Documents, etc. and a 1x4TB SSD ZFS “pool” for Frigate camera footage storage.

Due to having hardware passed-through to the VMs (GPU and Google Coral TPU) and using hardware resource mappings (one node as an Nvidia RTX A2000, two have Nvidia RTX 3050s - can have them all with the same mapped resource node ID to pass-through without issue despite being different GPUs), I don’t have instant HA failover.

Additionally, as I am using ZFS with data on all three separate nodes, I understand that I have a “gap” window in the event of HA where the data on one of the other nodes may not be all the way up-to-date if a failover occurs before a replication.

So after all the above - this brings me to my question - what is the best way to replicate data that is not a VM or a CT, but raw data stored on those ZFS pools for the SMB/NFS shares - from one node to another?

I have been using Sanoid/Syncoid installed on one node itself, with cronjobs. I’m sure I’m not using it perfectly (boy did I have a “fun” time with the config files), and I did have a headache with retention and getting rid of ZFS snapshots in a timely manner to not fill up the drives needlessly - but it seems to be working.

I just setup the third node (the “analogous” one in specs) which I want to be the active “primary” node and need to copy data over from the other current primary node - I just want to do it intelligently, and then have this node, in its new primary node role, take over the replication of data to the other two nodes.

Would so very, very badly appreciate guidance from those more informed/experienced than me on such topics.

you are viewing a single comment's thread
view the rest of the comments

[-] lazynooblet@lazysoci.al 6 points 17 hours ago

Isn't your use case exactly what Ceph is for?

[-] pr0927@lemmy.world 1 points 17 hours ago

I think so, but I already have a great deal of data stored on my existing pools - and I wanted the benefits of ZFS. Additionally, it's my understanding that Ceph isn't ideal unless you have a number of additional node-to-node direct connections - I don't have this - insufficient PCIe slots in each node for additional NICs.

But thanks for flagging - in my original post(s) elsewhere I mentioned in the title that I was seeking to avoid Ceph and GlusterFS, and forgot to mention that here. 🙃

[-] computergeek125@lemmy.world 2 points 11 hours ago

That's the neat part - Ceph can use a full mesh of connections with just a pair of switches and one balance-slb 2-way bond per host. So each host only needs 2 NIC ports (could be on the same NIC, I'm using eno1 and eno2 of my R730's 4-port LOM), and then you plug each of the two ports into one switch (two total for redundancy, in case a switch goes down for maintenance or crash). You just need to make sure the switches have a path to each other at the top.

[-] pr0927@lemmy.world 2 points 51 minutes ago

Yeah sounds cool, but will likely need to be a project for a successor series of servers one day, given other limitations I have with switches, rack space (and money, haha).

[-] computergeek125@lemmy.world 2 points 36 minutes ago

Yeah that's totally fair. I have nearly a kilowatt of real time power draw these days, Rome was not built in a day.

[-] pr0927@lemmy.world 2 points 33 minutes ago

Hahaha I will not be measuring my power draw, don't need a reason for my wife to question my absurd tech antics. xD

[-] lud@lemm.ee 1 points 13 hours ago

If you do want to use Ceph you could probably use NICs with multiple ports per NIC.

Additionally you could use switches instead of a fully mesh system.

Full mesh is pretty neat but gets overwhelming very quickly if you have many nodes. With switching you only need two ports per node for your Ceph network.

[-] pr0927@lemmy.world 1 points 11 hours ago

Gotcha - I'll keep that in mind for maybe one day down the road. For now I'm limited in PCIe bandwidth lanes on the NICs I have in (10Gb internal networking) and zero open ports on my switches. :(

this post was submitted on 27 Dec 2024

25 points (96.3% liked)

Selfhosted

40749 readers

492 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.
No spam posting.
Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.
Don't duplicate the full text of your blog or github here. Just post the link for folks to click.
Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).
No trolling.

Resources:

selfh.st Newsletter and index of selfhosted software and apps
awesome-selfhosted software
awesome-sysadmin resources
Self-Hosted Podcast from Jupiter Broadcasting

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 2 years ago

MODERATORS

HybridSarcasm@lemmy.world

HybridSarcasm@lemmy.hybridsarcasm.xyz