HA in Linux is pretty easy


NOTICE: -- Very important:
  • Please do NOT FAT FINGER. Check your IP address/hostname correct first before configuring PCS.
  • If any service(s) do(es) not startup correctly, please check the service log under /var/log/cluster or /var/log/pcsd. systemctl -xe does not help at all


1> Install PCS

  •  yum isntall pcs -y


2>  Configure PCS. Please run the following commands on ALL nodes:

  • systemctl start pcsd
  • systemctl enable pcsd
  • passwd hacluster
3> Configure PCS, Run he following commands on ONE node:
  • pcs cluster auth <NODE-1> <NODE-2> <...>
  • pcs cluster setup --name <CLUSTER_NAME> <NODE-1> <NODE-2> <...>
  • pcs cluster start --all
  • pcs cluster enable --all
  • pcs resource create virtual_ip ocf:heartbeat:IPaddr2 ip=<IP_ADDR> cidr_netmask=32 op monitor interval=30s
  • pcs resource create <APPLICATION_MON> ocf:heartbeat:nginx configfile=/etc/nginx/nginx.conf op monitor timeout="5s" interval="5s"   (Please make a note, in here, I am using nginx app)
  • pcs property set stonith-enabled=false
  • pcs property set no-quorum-policy=ignore
  • pcs constraint colocation add <APPLICATION_MON> virtual_ip INFINITY
  • pcs constraint order virtual_ip then <APPLICATION_MON>
  • pcs cluster stop --all
  • pcs cluster start --all



The find output should looks like:

[root@node1 nginx]# pcs status
Cluster name: csr-proxy-ha
Stack: corosync
Current DC: hq-csr-proxy-3 (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
Last updated: Thu Nov 15 10:59:40 2018
Last change: Thu Nov 15 10:58:13 2018 by root via cibadmin on node1

2 nodes configured
2 resources configured

Online: [ node1 node2]

Full list of resources:

 virtual_ip     (ocf::heartbeat:IPaddr2):       Started node1
 proxy  (ocf::heartbeat:nginx): Starting node1

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled




Additional troubleshoot:
ERROR message:

[root@NODE1 ~]# pcs cluster start --all
NODE2: Error connecting to NODE2 - (HTTP error: 400)
NODE1: Starting Cluster...
Error: unable to start all nodes
NODE2: Error connecting to NODE2- (HTTP error: 400)

Cause:
Check your IP address/hostname to make a sure correct spell

you might use this command:
host <node1>



ERROR message:
 virtual_ip     (ocf::heartbeat:IPaddr2):       Stopped

Failed Actions:
* virtual_ip_start_0 on NODE1 'unknown error' (1): call=6, status=complete, exitreason='Unable to find nic or netmask.',
    last-rc-change='Thu Nov 15 10:31:55 2018', queued=0ms, exec=454ms
* virtual_ip_start_0 on NODE2 'unknown error' (1): call=6, status=complete, exitreason='Unable to find nic or netmask.',

Cause:
Check the VIP if in the same subnet of each node/



Comments

Popular posts from this blog

VIOS TIPs

Configure Solaris 10 LDOM on Solaris 11.4

Change P410i from HBA mode to Raid mdoe