Thursday, April 20, 2023

Postgres: ERROR: this node should be a standby

On a standby node, while performing a clone. I am receiving an error. So, I am on Standby and nothing's incorrectly done on the repmgr.conf either.

 repmgr -h 10.18.10.11 -U repmgr -d repmgr -f repmgr.conf standby clone



[postgres@center-2 ~]$ repmgr -f repmgr.conf standby register -F
INFO: connecting to local node "centos-2" (ID: 2)
ERROR: this node should be a standby (host=10.18.10.11 port=5432 dbname=repmgr user=repmgr)

Reason the clone was done without the-F parameter. It should be done with -F to overwrite the original. The error definitely bogus.

repmgr -h 10.18.10.11 -U repmgr -d repmgr -f /var/lib/pgsql/repmgr.conf standby clone -F

Postgresql: Setup and configuring Postgresql Replication Manager (Repmgr)

Continue from the first part of Create Hot Standby blog. 

https://postgresdesk.blogspot.com/2023/04/postgresql-creating-hot-standby.html


Download repmgr https://repmgr.org/  I will really need the repmgr.conf file from the download to make modifications for my setup.


It is necessary to modify the repmgr.conf for both Master and Standby nodes. The repmgr.conf file can be located anywhere in the VM. I am going to have mine copied to /var/lib/pgsql/  

/var/lib/pgsql/repmgr.conf


Install repmgr version 13 through yum on both Master and Slave nodes. That way, it set up and link all the necessary libraries for my setup. 

yum update
yum install repmgr13.x86_64


Modify the shared_preload_libraries with repmgr. Perform this for Standby node as well.

vi /var/lib/pgsql/13/data/postgresql.conf

shared_preload_libraries = 'repmgr'

Modify pg_hba.conf and add the following line for all repmgr. Perform this for Standby node as well.
vi /var/lib/pgsql/13/data/pg_hba.conf

local repmgr repmgr trust
host repmgr repmgr 192.168.1.0/24 trust
local replication repmgr trust
host replication repmgr 192.168.1.0/24 trust

Modify repmgr.conf file. Standby node requires similar changes. So, make the same modification for repmgr.conf in Standby node for id as 2.The follow_command entry is crucial. Every modification will require the postgres database to be restarted (systemctl start postgresql-13.service).  

node_id=1
node_name='centos-1
conninfo='host=192.168.1.11 port=5432 dbname=repmgr user=repmgr
data_directory='/var/lib/pgsql/13/data'
primary_follow_timeout=10
failover = 'automatic'
promote_command='repmgr standby promote -f /var/lib/pgsql/repmgr.conf'
follow_command='repmgr standby follow -f /var/lib/pgsql/repmgr.conf -W --upstream-node-id=2'


on Master node, create repmgr database and user. This is done once on Master node postgresql database.
[postgres@center-1 ~]$ createuser -s repmgr
[postgres@center-1 ~]$ createdb repmgr -O repmgr

Create and register Master node
[postgres@center-1 ~]$ repmgr -f /var/lib/pgsql/repmgr.conf master register
INFO: connecting to primary database...
NOTICE: attempting to install extension "repmgr"
NOTICE: "repmgr" extension successfully installed
NOTICE: primary node record (ID: 1) registered
[postgres@center-1 ~]$

Verify the repmgr is working on the Master node.

[postgres@center-1 ~]$ psql -d repmgr -U repmgr
psql (13.10)
Type "help" for help.

repmgr=# \dt
List of relations
Schema | Name | Type | Owner
--------+--------------------+-------+--------
repmgr | events | table | repmgr
repmgr | monitoring_history | table | repmgr
repmgr | nodes | table | repmgr
repmgr | voting_term | table | repmgr
(4 rows)

repmgr=# \x
Expanded display is on.
repmgr=# select * from nodes;
-[ RECORD 1 ]----+------------------------------------------------------
node_id | 1
upstream_node_id |
active | t
node_name | centos-1
type | primary
location | default
priority | 100
conninfo | host=192.168.1.11 port=5432 dbname=repmgr user=repmgr
repluser | repmgr
slot_name |
config_file | /var/lib/pgsql/repmgr.conf


On Standby node.

as root, stop the postgresql database.

[root@center-2 ~]# systemctl stop postgresql-13.service

On Standby node, perform a repmgr clone from the Master node.

[postgres@center-2 pgsql]$ repmgr -h 192.168.1.11 -U repmgr -d repmgr -f /var/lib/pgsql/repmgr.conf standby clone -F

NOTICE: destination directory "/var/lib/pgsql/13/data" provided
INFO: connecting to source node
DETAIL: connection string is: host=192.168.1.11 user=repmgr dbname=repmgr
DETAIL: current installation size is 40 MB
INFO: replication slot usage not requested; no replication slot will be set up for this standby
NOTICE: checking for available walsenders on the source node (2 required)
NOTICE: checking replication connections can be made to the source server (2 required)
WARNING: data checksums are not enabled and "wal_log_hints" is "off"
DETAIL: pg_rewind requires "wal_log_hints" to be enabled
WARNING: directory "/var/lib/pgsql/13/data" exists but is not empty
NOTICE: -F/--force provided - deleting existing data directory "/var/lib/pgsql/13/data"
NOTICE: starting backup (using pg_basebackup)...
HINT: this may take some time; consider using the -c/--fast-checkpoint option
INFO: executing:
pg_basebackup -l "repmgr base backup" -D /var/lib/pgsql/13/data -h 192.168.1.11 -p 5432 -U repmgr -X stream
NOTICE: standby clone (using pg_basebackup) complete
NOTICE: you can now start your PostgreSQL server
HINT: for example: pg_ctl -D /var/lib/pgsql/13/data start
HINT: after starting the server, you need to register this standby with "repmgr standby register"

As root startup, the postgres database follows with registering the Standby node to the database.


[root@center-2 ~]# systemctl start postgresql-13.service
[root@center-2 ~]# su - postgres

[postgres@center-2 ~]$ repmgr -f /var/lib/pgsql/repmgr.conf standby register
INFO: connecting to local node "centos-2" (ID: 2)
INFO: connecting to primary database
WARNING: --upstream-node-id not supplied, assuming upstream node is primary (node ID 1)
INFO: standby registration complete
NOTICE: standby node "centos-2" (ID: 2) successfully registered

The Replication Manager is completed as this point.


There are ways to monitor the repmgr setup. The event table will record everything that has been done in the replication.


repmgr=# select node_id, event, successful, details from events;
 node_id |           event           | successful |                                           details                                            
---------+---------------------------+------------+----------------------------------------------------------------------------------------------
       1 | cluster_created           | t          | 
       1 | primary_register          | t          | 
       2 | standby_clone             | t          | cloned from host "192.168.1.11", port 5432; backup method: pg_basebackup; --force: Y
       2 | standby_register          | t          | standby registration succeeded; upstream node ID is 1
       1 | repmgrd_start             | t          | monitoring cluster primary "centos-1" (ID: 1)
       2 | repmgrd_start             | t          | monitoring connection to upstream node "centos-1" (ID: 1)
       2 | standby_promote           | t          | server "centos-2" (ID: 2) was successfully promoted to primary
       2 | repmgrd_failover_promote  | t          | node "centos-2" (ID: 2) promoted to primary; old primary "centos-1" (ID: 1) marked as failed
       2 | repmgrd_reload            | t          | monitoring cluster primary "centos-2" (ID: 2)
       1 | standby_clone             | t          | cloned from host "192.168.1.12", port 5432; backup method: pg_basebackup; --force: Y
       1 | repmgrd_standby_reconnect | t          | node restored as standby after 827 seconds, monitoring connection to upstream node -1
       1 | standby_register          | t          | standby registration succeeded; upstream node ID is 2 (-F/--force option was used)
       2 | child_node_new_connect    | t          | new standby "centos-1" (ID: 1) has connected
       1 | standby_promote           | t          | server "centos-1" (ID: 1) was successfully promoted to primary
       1 | repmgrd_failover_promote  | t          | node "centos-1" (ID: 1) promoted to primary; old primary "centos-2" (ID: 2) marked as failed
       1 | repmgrd_reload            | t          | monitoring cluster primary "centos-1" (ID: 1)
       2 | standby_clone             | t          | cloned from host "192.168.1.11", port 5432; backup method: pg_basebackup; --force: Y
       2 | repmgrd_standby_reconnect | t          | node restored as standby after 252 seconds, monitoring connection to upstream node -1
       2 | standby_register          | t          | standby registration succeeded; upstream node ID is 1 (-F/--force option was used)
       1 | child_node_new_connect    | t          | new standby "centos-2" (ID: 2) has connected
(20 rows)

repmgr=# select * from nodes;
 node_id | upstream_node_id | active | node_name |  type   | location | priority |                       conninfo                        | repluser | slot_name |        config_file         
---------+------------------+--------+-----------+---------+----------+----------+-------------------------------------------------------+----------+-----------+----------------------------
       1 |                  | t      | centos-1  | primary | default  |      100 | host=192.168.1.11 port=5432 dbname=repmgr user=repmgr | repmgr   |           | /var/lib/pgsql/repmgr.conf
       2 |                1 | t      | centos-2  | standby | default  |      100 | host=192.168.1.12 port=5432 dbname=repmgr user=repmgr | repmgr   |           | /var/lib/pgsql/repmgr.conf
(2 rows)

repmgr=# \dt
              List of relations
 Schema |        Name        | Type  | Owner  
--------+--------------------+-------+--------
 repmgr | events             | table | repmgr
 repmgr | monitoring_history | table | repmgr
 repmgr | nodes              | table | repmgr
 repmgr | voting_term        | table | repmgr
(4 rows)

repmgr has some commands that essentially perform the same stuff as looking into the repmgr database.

repmgr -f /var/lib/pgsql/repmgr.conf cluster show
repmgr -f /var/lib/pgsql/repmgr.conf cluster crosscheck
repmgr -f /var/lib/pgsql/repmgr.conf cluster events

That's it. The Standby should not be able to perform DML and all DML should be coming from Master node. 

On Standby node.  

TestDB=# create table test1 (int serial);
ERROR: cannot execute CREATE TABLE in a read-only transaction

Create test1 table in Master node then log back into Standby node to perform data insertion. It still would not allowed it.

TestDB=# insert into test1 (select generate_series(1,1000));
ERROR: cannot execute INSERT in a read-only transaction


Summary of the 3 step process for switchover.
repmgr -h <IP_From_The_Primary_Node> -U repmgr -d repmgr -f repmgr.conf standby clone -F
sudo systemctl start postgresql
repmgr -f repmgr.conf standby register


Enjoy.


Monday, April 10, 2023

Postgresql: Creating Hot Standby

 

Create hot standby

As a fictitious example, I will assign them 2 IPs, 192.168.1.11 as centos-1 and 192.168.1.12 as centos-1. This can be achieved by staging the first VM as centos-1 and then cloning it to centos-2 and assigning new ip to it as 192.168.1.12. Two VMs with Postgresql database should have been staged at this point. Both of the should be accessible as normal database would.

Creating a hot standby is relatively straightforward. Basically, creating a new database replicator user, and perform pg_basebackup that contains the standby.signal file and copy the centos-1 postgres data over to overwrite centos-2 data directory. 

Create replicator user
postgres=# create user replicator with replication password 'password';
CREATE ROLE

Make changes to the following on postgresql.conf
archive_mode = on
archive_command = 'test | -f /var/lib/pgsql/archivelog/%f && cp %p /var/lib/pgsql/archivelog/%f'
listen_addresses = '*'


Add a new one for coming from the host and everything coming from the same subnet
host replication replicator 192.168.1.0/24 md5


Perform pg_basebackup
[postgres@center-1 pgsql]$ pg_basebackup -h 192.168.1.11 -U replicator -p 5432 -D basebackup -Fp -Xs -P -R
Password:
33610/33610 kB (100%), 1/1 tablespace

[postgres@center-1 pgsql]$ ls
13 archivelog basebackup

perform entire data copy over to Slave node. This will overwrite the entire data directory in the Slave node with a standby.signal file indicating the node is a standby node.

rsync -a basebackup/ postgres@192.168.1.12:/var/lib/pgsql/13/data/

Hot Standby is completed. The standby node will not be able to write by select only.