These scripts demonstrate the command line interface, CLI, “pg_rewind” re-provisioning a formerly deprecated PRIMARY host, pg1, into a viable STANDBY. Although the scripts have been written expressly for a custom-made set of two Linux containers using CENTOS7, it has been written in a form that is hopefully clear and concise enough that they can be of use to you and can be customized to suit your particular needs.
There are five scripts:
- 01.setup_lxc_environment.sh: Create two Linux containers from a custom made image
- 02.setup_replication.sh: Setup a 2 node async replication cluster
- 03.switchover.sh: Invoke a switchover i.e. stop pg1 and promote pg2
- 04.reprovision_pg1.sh: Here’s where the magic takes place
- 05.validate.sh: A simple test validating replication works correctly
Script 1: 01.setup_lxc_environment.sh
This script starts by destroying preexisting containers and rebuilding them from an already preexisting image that was custom-made, with postgres already installed. Although not necessary, the IP addresses for each container were hard-coded. Both containers have the postgres 12 binaries installed, with an already initialized data cluster.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 | #!/bin/bash set -e echo "=== stage 1 ===" for u in pg1 pg2 do set +e lxc stop $u 2>/dev/null lxc rm $u lxc launch template-CENTOS7-pg $u set -e done echo "=== stage 2 ===" for u in pg1 pg2 do lxc exec $u bash <<_eof_ hostnamectl set-hostname $u rm -f /var/lib/pgsql/12/data/log/* _eof_ done for u in pg1 pg2 do lxc stop $u done echo "=== stage 3 ===" for u in pg1 pg2 do lxc network attach lxdbr0 $u eth0 done echo "=== stage 4 ===" echo "sleeping ..." && sleep 2s lxc config device set pg1 eth0 ipv4.address 10.231.38.111 lxc config device set pg2 eth0 ipv4.address 10.231.38.112 for u in pg1 pg2 do lxc start $u done echo "sleeping ..." && sleep 8s lxc ls -cn4 pg echo "DONE" |
Script 2: 02.setup_replication.sh
The two postgres configuration files, pg_hba.conf and postgresql.conf, are configured for remote access with verbose logging providing plenty of information and is set for replication service.
While all the runtime parameters are important, pay special attention to these three runtime parameters:
- hot_standby_feedback: for replication service
- wal_keep_segments: you want to hold onto enough wal segments for the standby, just in case…
- wal_log_hints: pg_rewind requires this parameter otherwise it won’t work
The CLI pg_receivewal is useful in a script because it can test if a slot has already been created, thus pg_basebackup simply looks for the slot and is not required to create one.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 | #!/bin/bash set -e echo "=== stage 1 ===" lxc exec pg1 bash <<_eof_ su - postgres <<_eof1_ source /etc/profile.d/sh.local echo " # TYPE DATABASE USER ADDRESS METHOD # "local" is for Unix domain socket connections only local all all trust # IPv4 local connections: host all all 127.0.0.1/32 md5 host all all 0.0.0.0/0 md5 # IPv6 local connections: host all all ::1/128 md5 host all all ::0/0 md5 # Allow replication connections from localhost, by a user with the # replication privilege. local replication all trust host replication all 127.0.0.1/32 md5 host replication all 0.0.0.0/0 md5 host replication all ::1/128 md5 host replication all ::0/0 md5 " > /var/lib/pgsql/12/data/pg_hba.conf echo " archive_command='/bin/true' archive_mode='on' autovacuum_freeze_max_age=200000000 autovacuum_multixact_freeze_max_age=400000000 deadlock_timeout='1s' hot_standby_feedback='on' idle_in_transaction_session_timeout=0 listen_addresses='*' lock_timeout=0 log_autovacuum_min_duration=0 log_checkpoints='on' log_connections='on' log_disconnections='on' log_duration='on' log_filename='postgresql-%a.log' logging_collector='on' log_hostname='on' log_line_prefix=' app=%a user=%u error=%e line=%l vtxid=%v txid=%x tstamp=%m [%p] ' log_lock_waits='on' log_min_duration_statement=0 log_replication_commands='on' log_rotation_size=0 log_statement='ddl' log_temp_files=0 log_truncate_on_rotation='on' max_standby_archive_delay='30s' max_standby_streaming_delay='30s' old_snapshot_threshold=-1 statement_timeout=0 track_activity_query_size=2048 track_commit_timestamp='on' track_functions='pl' track_io_timing='on' vacuum_defer_cleanup_age=0 wal_keep_segments=60 wal_log_hints='on' wal_receiver_timeout='60s' wal_sender_timeout='60s' " > /var/lib/pgsql/12/data/postgresql.auto.conf _eof1_ systemctl start postgresql-12 _eof_ echo "=== stage 2 ===" lxc exec pg2 bash <<_eof_ su - postgres <<_eof1_ rm -rf /var/lib/pgsql/12/data/ /usr/pgsql-12/bin/pg_receivewal \ --slot=pg2 \ --create-slot \ --if-not-exists \ -d 'host=pg1 user=postgres password=postgres' /usr/pgsql-12/bin/pg_basebackup \ -d 'host=pg1 user=postgres dbname=postgres' \ -D /var/lib/pgsql/12/data/ \ -l initial_backup_pg2 \ -c fast \ -R -P \ -S pg2 rm -f /var/lib/pgsql/12/data/log/*.log _eof1_ systemctl start postgresql-12 && sleep 2s _eof_ |
Script 03.switchover.sh
It’s more accurate to call this a switchover than a failover. Notice that a new WAL is generated on pg1 before it’s stopped.
Promoting the STANDBY, pg2, is extremely easy in postgres these days. Just log in as the superuser and execute the SQL “select pg_promote()”.
The final step to this script creates and populates a table with some records.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | #!/bin/bash set -e echo "=== stage 1 ===" lxc exec pg1 bash <<_eof_ psql 'host=pg1 user=postgres password=postgres' <<_eof1_ checkpoint; select * from pg_walfile_name(pg_switch_wal()); _eof1_ systemctl stop postgresql-12 && sleep 2s _eof_ echo "=== stage 2 ===" lxc exec pg2 bash <<_eof_ psql 'host=pg2 user=postgres password=postgres' <<_eof1_ select pg_promote(); drop database if exists db01; create database db01; \c 'host=pg2 user=postgres password=postgres dbname=db01' select *,'hello world'::text as comments into t1 from (select * from generate_series(1,1E6))t; _eof1_ _eof_ echo "Host pg1 is shutdown and host pg2 is promoted as a read-write node" echo "DONE" |
Script 04.reprovision_pg1.sh
And here’s where the magic takes place.
Just before rewinding pg1 begins, a slot is explicitly added to PRIMARY host pg2. Although in this particular instance it isn’t really needed to make the replication work, since runtime parameter “wal_keep_segments”, which no longer exists in postgres version 13, is configured to retain 60 WAL segments, more than enough WALS for this demonstration.
As soon as pg_rewind completes, host pg1 needs some final tuning preparing it for STANDBY service:
- Configuration file postgresql.auto.conf is updated
- Creating file standby.signal informs postgres that this host is meant to replicated from another one.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | #!/bin/bash set -e lxc exec pg1 bash <<_eof_ /usr/pgsql-12/bin/pg_receivewal \ --create-slot \ --if-not-exists \ --slot pg1 \ -d 'host=pg2 user=postgres password=postgres' su - postgres <<_eof1_ /usr/pgsql-12/bin/pg_rewind \ --target-pgdata /var/lib/pgsql/12/data \ --source-server='user=postgres password=postgres host=pg2' echo " hot_standby = 'on' primary_conninfo = 'user=postgres password=postgres host=pg2 port=5432' primary_slot_name = 'pg1' recovery_target_timeline = 'latest' " >> /var/lib/pgsql/12/data/postgresql.auto.conf touch /var/lib/pgsql/12/data/standby.signal _eof1_ systemctl start postgresql-12 && sleep 2s _eof_ echo "DONE: Host pg1 is now a viable READ-ONLY STANDBY to the PRIMARY host pg2" |
Script 05.validate.sh
Validation tests a number of conditions and includes:
- Is pg1 a read-only standby?
- Can we see the table that was created after pg2 was promoted?
- Do we see the database that was created after pg1 was re-provisioned proving that replication is active?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | #!/bin/bash set -e lxc exec pg1 bash <<_eof_ psql 'host=pg2 user=postgres password=postgres' <<_eof1_ \qecho ---HOST PG2, READ-WRITE PRIMARY--- drop database if exists db_test; create database db_test; select pg_is_in_recovery from pg_is_in_recovery(),pg_sleep(3); \l _eof1_ psql 'host=pg1 user=postgres password=postgres dbname=db01' <<_eof1_ \qecho ---HOST PG1, READ-ONLY STANDBY--- select * from pg_is_in_recovery(); select count(*) from t1; \l _eof1_ _eof_ echo "DONE" |
Percona Distribution for PostgreSQL provides the best and most critical enterprise components from the open-source community, in a single distribution, designed and tested to work together.