HA iSCSI Target with DRBD (2 node cluster how-to)

Issues related to applications and software problems
hunter86_bg
Posts: 2019
Joined: 2015/02/17 15:14:33
Location: Bulgaria
Contact:

HA iSCSI Target with DRBD (2 node cluster how-to)

Post by hunter86_bg » 2017/12/28 22:17:36

Hey Community,

as I have seen several times questions about iSCSI in a corosync/pacemaker environment, I have decided to create a "short" how-to.
I would appreciate any feedback (including typos).

Prerequisites:
Minimal Install of 2 CentOS 7.4.1708

Packages I have used:

Code: Select all

# rpm -qa | grep -E "fence|pcs|targetcli|drbd" | sort
drbd90-utils-9.1.0-1.el7.elrepo.x86_64
drbd90-utils-sysvinit-9.1.0-1.el7.elrepo.x86_64
fence-agents-all-4.0.11-66.el7_4.3.x86_64
fence-agents-apc-4.0.11-66.el7_4.3.x86_64
fence-agents-apc-snmp-4.0.11-66.el7_4.3.x86_64
fence-agents-bladecenter-4.0.11-66.el7_4.3.x86_64
fence-agents-brocade-4.0.11-66.el7_4.3.x86_64
fence-agents-cisco-mds-4.0.11-66.el7_4.3.x86_64
fence-agents-cisco-ucs-4.0.11-66.el7_4.3.x86_64
fence-agents-common-4.0.11-66.el7_4.3.x86_64
fence-agents-compute-4.0.11-66.el7_4.3.x86_64
fence-agents-drac5-4.0.11-66.el7_4.3.x86_64
fence-agents-eaton-snmp-4.0.11-66.el7_4.3.x86_64
fence-agents-emerson-4.0.11-66.el7_4.3.x86_64
fence-agents-eps-4.0.11-66.el7_4.3.x86_64
fence-agents-hpblade-4.0.11-66.el7_4.3.x86_64
fence-agents-ibmblade-4.0.11-66.el7_4.3.x86_64
fence-agents-ifmib-4.0.11-66.el7_4.3.x86_64
fence-agents-ilo2-4.0.11-66.el7_4.3.x86_64
fence-agents-ilo-moonshot-4.0.11-66.el7_4.3.x86_64
fence-agents-ilo-mp-4.0.11-66.el7_4.3.x86_64
fence-agents-ilo-ssh-4.0.11-66.el7_4.3.x86_64
fence-agents-intelmodular-4.0.11-66.el7_4.3.x86_64
fence-agents-ipdu-4.0.11-66.el7_4.3.x86_64
fence-agents-ipmilan-4.0.11-66.el7_4.3.x86_64
fence-agents-kdump-4.0.11-66.el7_4.3.x86_64
fence-agents-mpath-4.0.11-66.el7_4.3.x86_64
fence-agents-rhevm-4.0.11-66.el7_4.3.x86_64
fence-agents-rsa-4.0.11-66.el7_4.3.x86_64
fence-agents-rsb-4.0.11-66.el7_4.3.x86_64
fence-agents-sbd-4.0.11-66.el7_4.3.x86_64
fence-agents-scsi-4.0.11-66.el7_4.3.x86_64
fence-agents-vmware-soap-4.0.11-66.el7_4.3.x86_64
fence-agents-wti-4.0.11-66.el7_4.3.x86_64
fence-virt-0.3.2-12.el7.x86_64
kmod-drbd90-9.0.9-1.el7_4.elrepo.x86_64
pcs-0.9.158-6.el7.centos.1.x86_64
targetcli-2.1.fb46-1.el7.noarch
1.Enable 'elrepo' repository on both nodes:

Code: Select all

yum -y install http://www.elrepo.org/elrepo-release-7.0-3.el7.elrepo.noarch.rpm
2.Install relevant packages:

Code: Select all

yum -y install fence-agents-all pcs targetcli "*drbd90*" vim-enhanced bash-completion net-tools bind-utils mlocate setroubleshoot-server policycoreutils-{python,devel}
3.Prepare the block device for the drbd (preferably use LVM as it now supports lvmraid with reshape and takeover):
In my case I'm adding 1 qcow2 disk with a serial number (cat /dev/urandom | tr -cd A-Za-z0-9 | head -c 32 ; echo) defined and same size on both machines:

Code: Select all

vgcreate drbd /dev/disk/by-id/virtio-YOUR_SERIAL_NUMBER
lvcreate -l 100%FREE -n drbd0 drbd
4.Create your drbd configuration (example on 'man 5 drbd.conf-9.0'):

Code: Select all

resource drbd0 {
                 net {
                      cram-hmac-alg sha1;
                      shared-secret "FooFunFactory";
                 }
                 volume 0 {
                      device    /dev/drbd0;
                      disk      /dev/drbd/drbd0;
                      meta-disk internal;
                 }
                 on drbd5-centos {
                      node-id   0;
                      address   192.168.122.80:7000;
                 }
                 on drbd6-centos {
                      node-id   1;
                      address   192.168.122.81:7000;
                 }
                 connection {
                      host      drbd5-centos  port 7000;
                      host      drbd6-centos  port 7000;
                      net {
                          protocol C;
                      }
                 }
           }
Use the same config on both nodes, as we use the same LV.

Prepare local block device for drbd replication (both nodes):

Code: Select all

drbdadm create-md drbd0
drbdadm up drbd0
When you check the status , it should be still in "Connecting" state as follows:

Code: Select all

#drbdadm status
drbd0 role:Secondary
  disk:Inconsistent
  drbd6-centos connection:Connecting
5.Prepare a firewalld service.
Copy an existing firewalld service:

Code: Select all

cp /usr/lib/firewalld/services/ssh.xml /etc/firewalld/services/drbd0.xml
Edit until it becomes like this:

Code: Select all

<?xml version="1.0" encoding="utf-8"?>
<service>
  <short>DRBD0</short>
  <description>DRBD0 service</description>
  <port protocol="tcp" port="7000"/>
</service>
Reload firewalld to enable the new service:

Code: Select all

firewall-cmd --reload && firewall-cmd --permanent --add-service={drbd0,high-availability,iscsi-target} && firewall-cmd --reload
Now the DRBD status should look like this:

Code: Select all

drbdadm status
drbd0 role:Secondary
  disk:Inconsistent
  drbd6-centos role:Secondary
    peer-disk:Inconsistent
Force one of the nodes to become primary , which will sync the nodes:

Code: Select all

drbdadm --force primary drbd0
Wait till the sync reaches 100%.You can use 'drbdadm status'.

6.Set SELinux in permissive. Do not disable SELinux, as we won't have AVC denials for analysis.

Code: Select all

setenforce 0
7.Preparation of cluster (I'm using fence_xvm for STONITH)
Enable pcsd on both nodes:

Code: Select all

systemctl enable --now pcsd
Change the password of 'hacluster' user on both nodes:

Code: Select all

echo centos | passwd --stdin hacluster
Auth both pcs daemons:

Code: Select all

pcs cluster auth drbd5-centos drbd6-centos
Note: use user hacluster with the password from the previous step

Build the cluster:

Code: Select all

pcs cluster setup --start --enable --name CentOS-DRBD-iSCSI drbd5-centos drbd6-centos --transport udpu --wait_for_all=1 --encryption 1
Note: DNS resolution is required.For node name use the output of 'uname -n'.

Build your STONITH. I'm skipping this part.
Once your STONITH is working test it via:

Code: Select all

pcs stonith fence NODE_NAME
If you plan not to use a STONITH device (which is HIGHLY not recommended), disable stonith-enabled property or no resource will be started.
To check run:

Code: Select all

pcs property show --all | grep stonith-enabled
8.DRBD cluster resource configuration

Code: Select all

pcs cluster cib /root/cluster
pcs -f /root/cluster resource create DRBD0 ocf:linbit:drbd drbd_resource=drbd0
pcs -f /root/cluster resource master MASTER-DRBD0 DRBD0 meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true
pcs cluster cib-push /root/cluster
Note: The master will be promoted once a resource depends on it.

9.Iscsi-ip resource creation into iscsi group

Code: Select all

pcs resource create iscsi-ip ocf:heartbeat:IPaddr2 ip=192.168.122.244 cidr_netmask=24 --group iscsi
10.Colocation and order constraints for iscsi group

Code: Select all

rm -f /root/cluster
pcs cluster cib /root/cluster
pcs -f /root/cluster constraint order promote MASTER-DRBD0 then start iscsi Mandatory id=iscsi-always-after-master-drbd
pcs -f /root/cluster constraint colocation add iscsi with master MASTER-DRBD0 INFINITY id=iscsi-group-where-master-drbd
pcs cluster cib-push /root/cluster
If the ip is not up and no master is started - check if SELinux is in permissive (we will fix this later).
Test resource being migrated via:

Code: Select all

pcs node standby && sleep 60 && pcs node unstandby
If your constraint does not work properly , you might have your iscsi-ip on the recently 'unstandby'-ed node. Check your colocation constraint!
Also consider setting some stickiness via:

Code: Select all

pcs property set default-resource-stickiness=100
11.Enable target.service on both nodes:
Note: If you skip this , there will be some issues with missing '/sys/kernel/config/target'.

Code: Select all

systemctl enable --now target.service
12.iSCSI target and LUN resources to the iscsi group

Code: Select all

rm -f /root/cluster
pcs cluster cib /root/cluster
pcs -f /root/cluster resource create iscsi-target ocf:heartbeat:iSCSITarget iqn="iqn.2018-01.com.example:centos" allowed_initiators="iqn.2018-01.com.example:kalinsg01"   --group iscsi
pcs -f /root/cluster resource create iscsi-lun0 ocf:heartbeat:iSCSILogicalUnit target_iqn=iqn.2018-01.com.example:centos lun=0 path=/dev/drbd0 --group iscsi
pcs cluster cib-push /root/cluster
Notes:
A)target_iqn must mach the iqn defined during the creation of the target resource
B)If no allowed_initiators are defined for the 'ocf:heartbeat:iSCSITarget' resource - everyone is allowed to access the iSCSI Target

13.Service relocation test.
The relocation is needed in order to generate AVC denials in the /var/log/audit/audit.log

14.SELinux policy generation:
Analyse the audit log via:

Code: Select all

sealert -a /var/log/audit/audit.log
You will see the following recommendations:

Code: Select all

setsebool -P domain_kernel_load_modules 1
setsebool -P daemons_enable_cluster_mode 1
Once you execute them (on both nodes), stop the cluster via:

Code: Select all

pcs cluster stop --all
And reboot both nodes simultaneously.
If the cluster doesn't come up , clean up the audit.log , set into permissive again and repeat step 14.

15.Now discover the iscsi target and login to it from the allowed initiator (for help use "man iscsiadm"):
Edit your initiator name:

Code: Select all

# cat /etc/iscsi/initiatorname.iscsi 
InitiatorName=iqn.2018-01.com.example:kalinsg01
Restart the iscsi daemon:

Code: Select all

systemctl restart iscsid.service
Discover the HA-iSCSI:

Code: Select all

iscsiadm --mode discoverydb --type sendtargets --portal 192.168.122.244 --discover
Note: Use an IP, as there were some bugs in RHEL 7.0

Login to the HA-iSCSI:

Code: Select all

iscsiadm --mode node --targetname  iqn.2018-01.com.example:centos --portal 192.168.122.244:3260 --login
Verify your iscsi device via 'lsscsi':

Code: Select all

#lsscsi
[4:0:0:0]    disk    LIO-ORG  iscsi-lun0       4.0   /dev/sdc
Last edited by hunter86_bg on 2018/05/29 13:39:22, edited 3 times in total.

hunter86_bg
Posts: 2019
Joined: 2015/02/17 15:14:33
Location: Bulgaria
Contact:

Re: HA iSCSI Target with DRBD (2 node cluster how-to)

Post by hunter86_bg » 2018/01/28 21:00:27

After some testing, I have noticed that setting the 'allowed_initiators' in the iscsi-lun0 resource did not work, thus I have modified step 12 to represent a working solution.

If the iSCSI Initiators will use the iscsi-lun0 as a PV -> then we should add the device into the 'global_filter' section of our DRBD clusters' lvm.conf, otherwise the drbd device will be kept as primary on both nodes and this will cause havoc in your cluster.
Here is a short example:

Code: Select all

[root@drbd1-rhel ~]# grep global_filter /etc/lvm/lvm.conf
	# Configuration option devices/global_filter.
	# Use global_filter to hide devices from these LVM system components.
	# global_filter are not opened by LVM.
global_filter = [ "r|/dev/drbd/drbd0|", "r|/dev/drbd0|" ]
	# devices/global_filter.
Note: It is not necessary to rebuild the initramfs via 'dracut' as we do not bring the drbd device before the cluster is up and running, but still it is nice to keep everything consistent.

ladam@ictuniverse.eu
Posts: 12
Joined: 2019/07/14 12:17:27

Re: HA iSCSI Target with DRBD (2 node cluster how-to)

Post by ladam@ictuniverse.eu » 2019/07/14 12:39:58

Hi,

I'm try to setup this kind og HA on Centos7. But my problem is link to the iscsi config.with targetcli, I create on the first node :
/> ls
o- / ......................................................................................................................... [...]
o- backstores .............................................................................................................. [...]
| o- block .................................................................................................. [Storage Objects: 1]
| | o- hdd001 ...................................................................... [/dev/drbd0 (20.0GiB) write-thru deactivated]
| | o- alua ................................................................................................... [ALUA Groups: 1]
| | o- default_tg_pt_gp ....................................................................... [ALUA state: Active/optimized]
| o- fileio ................................................................................................. [Storage Objects: 0]
| o- pscsi .................................................................................................. [Storage Objects: 0]
| o- ramdisk ................................................................................................ [Storage Objects: 0]
o- iscsi ............................................................................................................ [Targets: 1]
| o- iqn.2019-07.ict:sn.1234567890 ..................................................................................... [TPGs: 1]
| o- tpg1 ............................................................................................... [no-gen-acls, no-auth]
| o- acls .......................................................................................................... [ACLs: 1]
| | o- iqn.1998-01.com.vmware:esx12-6f0bfd40 ................................................................ [Mapped LUNs: 0]
| o- luns .......................................................................................................... [LUNs: 0]
| o- portals .................................................................................................... [Portals: 1]
| o- 192.168.14.244:3260 .............................................................................................. [OK]
o- loopback ......................................................................................................... [Targets: 0]
/>

but when I want to do the same on the second node, I got :
/backstores/block> create hdd001 /dev/drbd0
Cannot configure StorageObject because device /dev/drbd0 is already in use


Any idea?

Thanks

User avatar
TrevorH
Site Admin
Posts: 33202
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: HA iSCSI Target with DRBD (2 node cluster how-to)

Post by TrevorH » 2019/07/14 13:05:47

What is on your drbd0 device? Is it an LVM PV? In which case, did you adjust the filter in lvm.conf to exclude it?
The future appears to be RHEL or Debian. I think I'm going Debian.
Info for USB installs on http://wiki.centos.org/HowTos/InstallFromUSBkey
CentOS 5 and 6 are deadest, do not use them.
Use the FAQ Luke

hunter86_bg
Posts: 2019
Joined: 2015/02/17 15:14:33
Location: Bulgaria
Contact:

Re: HA iSCSI Target with DRBD (2 node cluster how-to)

Post by hunter86_bg » 2019/07/14 13:12:52

In clustered environment you should not use targetcli,but provide all details to cluster.
Sadly, it took redhat more than an year to fix a bug in the iSCSI resource ...
As far as I remember , just follow the guide. I'm using LVs with the same name, so the drbd conf is quite straightforward.

Maybe, I will find time for a 3-node DRBD cluster.

ladam@ictuniverse.eu
Posts: 12
Joined: 2019/07/14 12:17:27

Re: HA iSCSI Target with DRBD (2 node cluster how-to)

Post by ladam@ictuniverse.eu » 2019/07/28 15:07:41

Hi,

I just follow your guide but still not working
Maybe I have to create the iscsi target and lun before but how?

Thanks Laurent

********************************************
System info
********************************************
[root@node1 ~]# uname -a
Linux node1.securecloud.lu 3.10.0-957.21.3.el7.x86_64 #1 SMP Tue Jun 18 16:35:19 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Selunux disable
Firewalld disable

[root@node1 ~]# lsscsi
[0:0:0:0] disk VMware Virtual disk 1.0 /dev/sda
[0:0:1:0] disk VMware Virtual disk 1.0 /dev/sdb
[0:0:2:0] disk VMware Virtual disk 1.0 /dev/sdc
[2:0:0:0] cd/dvd NECVMWar VMware IDE CDR10 1.00 /dev/sr0

[root@node2 ~]# lsscsi
[0:0:0:0] disk VMware Virtual disk 1.0 /dev/sda
[0:0:1:0] disk VMware Virtual disk 1.0 /dev/sdb
[0:0:2:0] disk VMware Virtual disk 1.0 /dev/sdc
[2:0:0:0] cd/dvd NECVMWar VMware IDE CDR10 1.00 /dev/sr0
[root@node2 ~]#

[root@node2 ~]# pcs resource show
Master/Slave Set: MASTER-DRBD0 [DRBD0]
Masters: [ node1 ]
Slaves: [ node2 ]
Resource Group: iscsi
iscsi-ip (ocf::heartbeat:IPaddr2): Started node2
iscsi-target (ocf::heartbeat:iSCSITarget): Stopped
iscsi-lun0 (ocf::heartbeat:iSCSILogicalUnit): Stopped
[root@node2 ~]#

[root@node2 ~]# pcs cluster status
Cluster Status:
Stack: corosync
Current DC: node2 (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum
Last updated: Sun Jul 28 11:05:20 2019
Last change: Sun Jul 28 10:47:50 2019 by root via cibadmin on node1
2 nodes configured
5 resources configured

PCSD Status:
node2: Online
node1: Online
********************************************
Install process
********************************************

yum -y install http://www.elrepo.org/elrepo-release-7. ... noarch.rpm
yum -y install fence-agents-all pcs targetcli "*drbd90*" vim-enhanced bash-completion net-tools bind-utils mlocate setroubleshoot-server policycoreutils-{python,devel} iscsi-initiator-utils


vgcreate drbd /dev/sdb
lvcreate -l 100%FREE -n drbd0 drbd
vi /etc/lvm/lvm.conf

# Configuration option devices/global_filter.
# Use global_filter to hide devices from these LVM system components.
# global_filter are not opened by LVM.


global_filter = [ "r|/dev/drbd/drbd0|", "r|/dev/drbd0|" ]
vi /etc/drbd.d/drbd0.res

resource drbd0 {
net {
cram-hmac-alg sha1;
shared-secret "FooFunFactory";
}
volume 0 {
device /dev/drbd0;
disk /dev/drbd/drbd0;
meta-disk internal;
}
on node1 {
node-id 0;
address 192.168.14.60:7000;
}
on node2 {
node-id 1;
address 192.168.14.61:7000;
}
connection {
host node1 port 7000;
host node2 port 7000;
net {
protocol C;
}
}
}


drbdadm create-md drbd0
drbdadm up drbd0

drbdadm --force primary drbd0


systemctl enable --now iscsid.service
systemctl enable --now pcsd
echo centos | passwd --stdin hacluster
pcs cluster auth node1 node2

pcs cluster setup --start --enable --name CentOS-DRBD-iSCSI node1 node2 --transport udpu --wait_for_all=1 --encryption 1

pcs property set stonith-enabled=false
pcs property set no-quorum-policy=ignore

pcs cluster cib /root/cluster
pcs -f /root/cluster resource create DRBD0 ocf:linbit:drbd drbd_resource=drbd0
pcs -f /root/cluster resource master MASTER-DRBD0 DRBD0 meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true
pcs cluster cib-push /root/cluster

pcs resource create iscsi-ip ocf:heartbeat:IPaddr2 ip=192.168.14.244 cidr_netmask=24 --group iscsi



rm -f /root/cluster
pcs cluster cib /root/cluster
pcs -f /root/cluster resource create iscsi-target ocf:heartbeat:iSCSITarget iqn="iqn.2019-07.ict:sn.1234567890" allowed_initiators="iiqn.1994-05.com.redhat:8aa7c47c8f42" --group iscsi
pcs -f /root/cluster resource create iscsi-lun0 ocf:heartbeat:iSCSILogicalUnit target_iqn=iqn.2019-07.ict:sn.1234567890 lun=0 path=/dev/drbd0 --group iscsi
pcs cluster cib-push /root/cluster

hunter86_bg
Posts: 2019
Joined: 2015/02/17 15:14:33
Location: Bulgaria
Contact:

Re: HA iSCSI Target with DRBD (2 node cluster how-to)

Post by hunter86_bg » 2019/07/28 15:35:22

What version of the resources are you using?
I think the iSCSI bug I have reported is still not available in CentOS.
Try the debug procedure described in https://bugzilla.redhat.com/show_bug.cgi?id=1598969
If you see the same error - you can use the workaround until RH publish the fix.
Edit: According to bugzilla, it should be fixed in 'resource-agents-4.1.1-20.el7'

Edit2:
allowed_initiators="iiqn.1994-05.com.redhat:8aa7c47c8f42" --group iscsi
I hope this is a typo when you copied it to the forum. IQNs should always start with 'iqn.' instead of yours 'iiqn.'

Edit3: Your order rules are messed up.
This should never happen:
iscsi-ip (ocf::heartbeat:IPaddr2): Started node2
Check your rules as the group should start on the master (in your case node1).
Check step 10 for order & collocation rules.

User avatar
TrevorH
Site Admin
Posts: 33202
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: HA iSCSI Target with DRBD (2 node cluster how-to)

Post by TrevorH » 2019/07/28 16:05:07

Edit: According to bugzilla, it should be fixed in 'resource-agents-4.1.1-20.el7'
Current is 4.1.1-12.el7_6.8. I suspect that's a 7.7 update.
The future appears to be RHEL or Debian. I think I'm going Debian.
Info for USB installs on http://wiki.centos.org/HowTos/InstallFromUSBkey
CentOS 5 and 6 are deadest, do not use them.
Use the FAQ Luke

ladam@ictuniverse.eu
Posts: 12
Joined: 2019/07/14 12:17:27

Re: HA iSCSI Target with DRBD (2 node cluster how-to)

Post by ladam@ictuniverse.eu » 2019/07/28 16:37:49

What version of the resources are you using?

how can I check that?

Regards

ladam@ictuniverse.eu
Posts: 12
Joined: 2019/07/14 12:17:27

Re: HA iSCSI Target with DRBD (2 node cluster how-to)

Post by ladam@ictuniverse.eu » 2019/07/28 16:46:42

Sorry,

resource-agents-4.1.1-12.el7_6.8.x86_64
Laurent

Post Reply