pg_repack is one of the oldest, widely used, extension projects for PostgreSQL. It is so much popular that even DBaaS service providers couldn’t avoid it. It is a “power tool” in the hands of a DBA to deal with bloated/fragmented tables. I can’t imagine a serious production deployment without it these days. It magically replaces the bloated, fragmented tables with a fresh fully packed table without holding an exclusive lock on the table during its processing**. This extension made PostgreSQL’s built-in commands like VACUUM FULL and CLUSTER almost useless.
**Momentarily AccessExclusive locks are required. Please see the discussion below.
But unfortunately, regular and repeated usage of pg_repack increased the comfort levels of everyone (myself included). As a consequence, it is randomly recommended as a solution for everything, as a panacea. But I recently started coming across cases like a scheduled job doing pg_repack of every table. This time it rang an alarm bell and it started to feel like overuse of antibiotics. So I thought about writing about how pg_repack works and how it can affect your system while it is running for a regular user who doesn’t want to do a detailed study. because a better understanding might help for an informed decision on where it should be used.
pg_repack has a lot of functionalities (options). In this blog post, my intention is to discuss only how it does the basic “repack”ing of a fragmented/bloated table.
Pre-Checks and Ensuring Sanity
This is to ensure that pg_repack is installed and extension is available in the database, we are running the pg_repack as a superuser, etc. It creates a provision for cleaning up temporary objects created by pg_repack. It collects metadata about tables and associated objects like Indexes, Toast, Triggers, etc. because tracking of tables and associated objects is very important. Objects like invalid indexes, conflicting triggers, etc. are checked. We are not going to discuss this in detail.
The actual processing starts with obtaining an advisory lock on the table’s OID, to make sure no other pg_repack is working on the table. This is not so much a concern with full-table repacks, but mainly so that index-only repacks don’t interfere with each other or a full-table repack. If there are other pg_repaks in progress, repack attempts may get a message as follows and exit.
1 | ERROR: Another pg_repack command may be running on the table. Please try again later. |
Create temporary objects while holding AccessExclusive lock on the table
Yes. Correct, pg_repack needs the heavyweight AccessExclusive lock. But temporarily. pg_repack attempts to gain a table ACCESS EXCLUSIVE lock by executing a statement as follows:
1 | LOCK TABLE <tablename> IN ACCESS EXCLUSIVE MODE |
--wait-timeout
. Once this wait-time is over pg_repack will start trying to cancel the conflicting statements. Users may see messages as follows for each of the attempts.1 2 3 | WARNING: canceling conflicted backends WARNING: canceling conflicted backends ... |
So the point to be noted is:
So please avoid running pg_repack when there is a lot of concurrent activities on the table. because the new sessions are allowed to get conflicting locks like ACCESS SHARE concurrently and a session waiting for ACCESS EXCLUSIVE lock may need to wait indefinitely. We should be selecting a proper time window of low activity for pg_repack work
These cancel attempts will continue for another round of “wait-timeout” seconds. But even after attempting for this second round of wait-timeout, if the AcessExclusive lock is not obtained, it escalates to termination of every conflicting session. So pg_repack will terminate sessions if the total wait time exceeds double the “wait-time”. This could be problematic for application connections that get abruptly get terminated. This can lead to outages if the application layer does not handle it gracefully.
The pg_repack may emit a message as follows:
1 | WARNING: terminating conflicted backends |
1 2 | 2021-06-17 06:28:46.317 UTC [2761] FATAL: terminating connection due to administrator command 2021-06-17 06:28:46.317 UTC [2758] FATAL: terminating connection due to administrator command |
So the point to be noted is: pg_repack can terminate sessions if they stand against obtaining AcessExclusive lock which can lead to unexpected outage or misbehavior of the application.
If the double the wait time is crossed, pg_repack may just proceed with terminate the sessions. But, again this behavior also can be controlled using a parameter --no-kill-backend
. if this parameter is specified, pg_repack will respect all concurrent sessions and cancel itself instead of attempting to canceling or terminating other sessions
pg_repack may emit a message as follows
1 2 | WARNING: timed out, do not cancel conflicting backends INFO: Skipping repack public.t1 due to timeout |
I believe this is more desirable in production systems. So the point to be noted is:
--no-kill-backend
whenever you deal with critical systemsWhen AccessExclusive is locked, pg_repack creates all temporary objects including the substituted table with the same structure.
1 | CREATE TYPE repack.pk_16423 AS (id integer) |
1 | CREATE TYPE repack.pk_16824 AS (id integer, id1 integer, id2 timestamp without time zone); |
1 | CREATE TABLE repack.log_16423 (id bigserial PRIMARY KEY, pk repack.pk_16423, row public.t1) |
Now pg_repack creates a trigger on the table to be repacked so that whenever there is DML on the table, corresponding information needs to be captured to the log table created above. This is done using an AFTER INSERT OR DELETE OR UPDATE trigger. For example:
1 | CREATE TRIGGER repack_trigger AFTER INSERT OR DELETE OR UPDATE ON public.t1 FOR EACH ROW EXECUTE PROCEDURE repack.repack_trigger('INSERT INTO repack.log_16423(pk, row) VALUES( CASE WHEN $1 IS NULL THEN NULL ELSE (ROW($1.id)::repack.pk_16423) END, $2)') |
Since the pg_repack is holding the AccessExclusive lock at this stage, there won’t be any concurrent DMLs at this stage, which is going to change in the following stages.
pg_repack plays an important trick at this stage before releasing AcessExclusive lock as the comment in the source code says:
1 2 3 4 5 6 | /* While we are still holding an AccessExclusive lock on the table, submit * the request for an AccessShare lock asynchronously from conn2. * We want to submit this query in conn2 while connection's * transaction still holds its lock, so that no DDL may sneak in * between the time that connection commits and conn2 gets its lock. */ |
Yes, pg_repack uses another connection to the database and sends an AccessShare lock request through that. This prevents any DDL in between the switch to AccessShare lock. The moment the main connection commits, An AccessShare lock will be granted to the second connection. So pg_repack uses two database connections to carry out the work.
But still, there is a chance that some DDL can interfere. So pg_repack kills any concurrent DDL against the table by default.
Once this stage is complete pg_repack can proceed with releasing AccessExclusive lock on the first connection by COMMIT
ing the transaction. So that the AccessShare lock request by the second connection will be granted. This COMMIT is very special that all the log table and triggers on the table will be committed so that it is available to the entire system from this point onwards.
Copying rows/tuples to a temporary table
Copying of tuples from the table to a new table is performed with SERIALIZABLE isolation. because there shouldn’t be any inconsistency between the data getting into the log table and the temporary, substitute table which pg_repack is going to create.
1 | BEGIN ISOLATION LEVEL SERIALIZABLE |
Since the AccessExclusive lock is removed, concurrent sessions can proceed with their DMLs and Select queries. Only DDLs will be blocked. So we can say that the table is available for transactions and queries.
Prior to the data copy, the log table is truncated. because pg_repack needs only those log data that is captured from the starting of data copy.
1 | DELETE FROM repack.log_16423 |
Again pg_repack will attempt to kill any session which might be waiting for doing a DDL and get an AcessShare lock on the table. Since the other connection is already holding an AccessShare lock, This can be gained without much problem.
At this stage, pg_repack creates a substitute table that is going to replace the original table with the exact same structure as the original table but without any data. It will be a CTAS statement, something like:
1 | CREATE TABLE repack.table_16423... AS SELECT col1,col2,col3... FROM ONLY public.t1 WITH NO DATA |
followed by the data copy:
1 | INSERT INTO repack.table_16423 SELECT col1,col2,co3... FROM ONLY public.t1 |
Once the data copy is over, the transaction will be COMMIT ed. This completes one heaviest stage in repacking in terms of load and WAL generation
Indexes, Keys will be created at this stage on this temporary, substitute table at this stage.
Apply the CDC log to a temporary table
Please remember that pg_repack’s main connection is not holding any lock on the table at this stage (other than the second connection’s AccessShare lock). So There is nothing blocking the transactions (DMLs) at this stage. Depending on the time it took for the data copy in the previous stage, and concurrent transactions during the data copy, There could be a lot of CDC(Change Data Capture) entries in the log file. This needs to be copied to the new temporary/substitute table.
This logic is implemented as C function in pg_repack. you may refer to the source code of repack_apply. It reads all the data from the log table and processes INSERTS, UPDATES, and DELETES. In order to speed up the repeated operations, Prepared Statements are used. Finally, all those data from the log table which is processed will be deleted from the log table.
Swapping the original table with a temporary table
This is performed by the second connection because it already holds an AccessShare lock, But it will escalate the lock to AccessExclusive lock.
1 | LOCK TABLE public.t1 IN ACCESS EXCLUSIVE MODE |
The CDC apply will be performed once again (The same “repack_apply”) while holding the AccessExclusive lock on the table. So if there is any new entry that appears in the log table, that also will be processed.
The original table and the temporary table quickly by executing repack_swap function like:
1 | SELECT repack.repack_swap('16423'); |
This is the most beautiful and powerful part of pg_repack. which is implemented in a C function repack_swap. Not just tables are swapped, ownership, associated toasts (table and index), indexes, and dependencies are also swapped. Oids are swapped so that the oid of the table remains the same even after pg_repack. The Swapping work complies with a COMMIT
Final cleanup
pg_repack uses its built-in C Function repack_drop for doing the cleanup of all temporary objects. To prevent any concurrent sessions from acquiring a lock on the table which could prevent the cleanup, An AccessExclusive lock is obtained before the cleanup. This is the third time an AccessExcluive lock is placed on the table.
Summary
pg_repack is one of the most powerful, popular, and useful extensions. We encourage the usage wherever applicable with proper supervision. But please avoid over-usage. As I tried to explain, we should expect a good amount of data movement between the original table to the temporary table, trigger writing to the log table, data copy from the log table to the temporary table, etc. So we should be expecting a higher WAL generation also. Considering all the implications, pg_repack needs to be performed on a low activity time window to avoid undesirable consequences.
Some of the important points to reiterate for end-users are:
- pg_repack needs to acquire heavyweight AccessExclusive lock multiple times. But Temporarily.
- In a high concurrency situation, it will be almost impossible to get an AccessExclusive lock
- pg_repack, by default, will attempt canceling the conflicting statements if it is not able to gain AcessExclusive lock-in wait-time
- It may proceed to terminate sessions if the total wait exceeds double the amount of wait time. This could lead to undesirable outcomes and outages.
- Defaults of pg_repack may not be good for critical systems. use
--no-kill-backend
option to make it more gentle. - No DDLs are allowed against the table which is undergoing pg_repack and any session that attempts to do so might get killed.
Percona Distribution for PostgreSQL provides the best and most critical enterprise components from the open-source community, in a single distribution, designed and tested to work together.
The ‘followed by the data copy:” sentence is malformatted and looks like part of the SQL session, so I read over it on first pass and then wondered where the actual copying happens.
Thank you @Michael Rikmas, Thank you for your feedback. Yes it was a formatting mistake and it is corrected as per your feedback.
Hope you liked the article.
We are facing issues while doing the repacking of tables for which we have kept the replica identity : FULL , to full fill our debezium use case.
“facing issues” issue is a very broad statement. Without more details, others won’t be able to understand.
Seeing below issue while pg_repack, db is hosted ve aws postgres aurora 12.x.
Table size is 650GB plus including index
ERROR: query failed: ERROR: could not write to file “base/pgsql_tmp/pgsql_tmp17644.58”: No space left on device
Man, this seems to be a bigger issue. AFAIK, You should have at least twice the amount of table size on the DB disk before running
pg_repack
. That is one of the basic validations of this extension. This seems to be an outage-like situation. Can you provide some insights into what happened after this issue?Thank you for the insight into pg_repack, this I haven’t found documented elsewhere.
There is still, one mystery to me: why does pg_repack require a primary key or a UNIQUE key?
According to your article, it must be because of the CDC table:
CREATE TABLE repack.log_16423 (id bigserial PRIMARY KEY, pk repack.pk_16423, row public.t1)
We see it contains the public key of our table as a separate column. But where is it used and why? It baffles me that I can’t run pg_repack on a table without a UNIQUE column.