One of the reasons I’ve never strayed far from PostgreSQL is that I am always discovering features that overcome all sorts of interesting problems. Although, to be honest, from time to time I’ve had the feeling that sometimes some features are a solution in search of a problem. Take, for example, exporting transaction snapshots

Originally implemented in version 9.2, snapshot exports allow multiple transactions to share identical views of the same database state. Back then I wasn’t certain how I could use it until pg version 9.3 came out presenting a true use-case when multi-threaded downloads were introduced to pg_dump.

Here’s a simple example; let’s start by creating two tables with lots of data:

TIP: Changing the arguments in the generate_series function will populate the tables that best fits your hardware’s capabilities.

The traditional invocation of a logical dump uses a single CPU/thread archiving the tables one at a time. The CLI time generates timing statistics returning the time it takes to perform the dump using the “compressed” format:

An alternate invocation generating the dump uses the “directory” format:

Thus far the execution utilizes a single CPU. Now look at the execution time when the ‘-j’, or ‘–jobs’ switch, where multiple connections are used to generate the dump:

Notice the number of processes running in the background. The third connection is a leader process which coordinates the two worker processes:

Without the synchronized snapshot feature, the different worker jobs wouldn’t be guaranteed to see the same data in each connection, which could lead to an inconsistent backup.

One important caveat to remember is that the pg_dump jobs switch only works with the “directory” format. Attempting to use jobs to a single database dump fails:

There is a full description of other caveats using the jobs switch in the PostgreSQL documentation.

So we’ve shown how it works with pg_dump, but can we go further? … YES!

We can replicate the behavior using the snapshot synchronization function pg_export_snapshot().

Continuing with the two previously created tables, let’s create another scenario. Look at the table below for each step:

  • STEP 1: Three psql sessions login and are interacting with tables t1 and t2 in tandem.
  • STEP 2: Session 1 inserts every five seconds to the tables. Session 2 sees the data differently than session 3, note the three-second delay queried in session 3, thus making it problematic seeing the same dataset at the same time.
  • STEP 3: Session 1 continues updating tables t1, t2 but now sessions 2 and 3 are using the same snapshot session.
  • STEP 4: While session 1 continues updating tables t1, t2 sessions 2 and 3 see the same datasets i.e. they are synchronized.
  • STEP 5: Both sessions 2, and 3 are actively copying data at the same time without fear of inconsistency.
  • STEP 6: The COPY is completed so let’s finish up by committing the transactions.

STEP

Session 1

Session 2

Session 3

1

psql db01

psql db01

psql db01

2

with a(x) as (select max(c1) from t1),

b as (insert into t1

select generate_series(x+1,x+10)

from a returning *),

c as (insert into t2

select generate_series(x+1,x+10)

from a returning *)

select x as increment from a;


watch 5

with a(x) as (select max(c1) from t1)

select c1,md5(c2),c3 from t1,a where c1=x

union all

select c1,md5(c2),c3 from t2,a where c1=x;

with a(x) as (select max(c1) from t1)

select c1,md5(c2),c3 from t1,a where c1=x

union all

select c1,md5(c2),c3 from t2,a,pg_sleep(3) where c1=x;

3

Step 2 is active

— let’s export the snapshot

— 00000003-000021CE-1

begin transaction isolation level repeatable read;

select pg_export_snapshot();

— let’s import the snapshot

begin transaction isolation level repeatable read;

set transaction snapshot ‘00000003-000021CE-1’;

4

Step 2 is active

with a(x) as (select max(c1) from t1)

select c1,md5(c2),c3 from t1,a where c1=x

union all

select c1,md5(c2),c3 from t2,a where c1=x;

with a(x) as (select max(c1) from t1)

select c1,md5(c2),c3 from t1,a where c1=x

union all

select c1,md5(c2),c3 from t2,a,pg_sleep(3) where c1=x;

5

Step 2 is active

copy t1 to program ‘gzip > t1.gz’

copy t2 to program ‘gzip > t2.gz’

6

Step 2 is terminated

— commit or rollback,
— it’s your choice

commit

— commit or rollback,
— it’s your choice

commit

 

And there you have it, a neat little trick exporting snapshots! Of course, there’s more you can do than just copying tables in a multi-threaded fashion but hopefully, this little exposé will lead to more opportunities to simplify your life while working with Postgres.

FAQs

How does the performance of using snapshots compare to traditional backup methods in terms of time and system resources?

Using PostgreSQL snapshots can significantly enhance certain operations by ensuring data consistency over time without the drawbacks associated with traditional backup methods, such as table locking or reduced availability. These snapshots can efficiently capture the database state with less resource overhead, influenced by the database’s size and its operational activity. However, the actual performance impact varies based on the database workload and the hardware in use.

Are there any specific limitations or considerations when using snapshots in large-scale production environments?

In large-scale production environments, managing snapshots requires careful consideration of storage and performance impacts. With increasing database sizes and transaction volumes, the demands for maintaining and moving snapshots escalate. It becomes crucial to keep an eye on the system’s performance and prepare for the extra load introduced by snapshots, ensuring smooth operations without compromising efficiency.

How do snapshot exports impact concurrent transactions and overall database performance during their creation and use?

Exporting a snapshot creates a static view of the database at a certain moment, not halting ongoing transactions, but possibly affecting performance due to greater storage needs and potentially extended backup durations. Fine-tuning the approach to snapshot creation and handling is crucial for reducing such side effects.

Subscribe
Notify of
guest

0 Comments
Inline Feedbacks
View all comments