After working with PostgreSQL for so many years it’s sometimes surprising to see some of the utilities, to a certain degree, have capabilities that remain relatively unknown to more than a few people. So let me introduce you to manifests.
Both pg_dump and pg_restore have switches enabling us to zone in on specific objects of interest such as functions, tables, data, etc., as we create and restore a logical dump. But there comes a time when expressing this kind of granularity as arguments can become overly complex when we look at dozens, if not hundreds, of objects.
Enter manifests
PostgreSQL dumps store not only the objects and data of a database but their descriptive meta-data too. When generated as a text file, a PostgreSQL manifest details each object on a single line. These lines can then be edited by either removing or reordering them. The edited file can then be used with pg_restore to restore the entities in the prescribed order.
Here’s how it works; the first step is to generate a standard dump using the custom compression argument -Fc
1 2 | # generate the logical dump pg_dump -C -Fc mydatabase > mydatabase.db |
Once the dump has been created the manifest is generated using pg_restore and the -l switch, which goes to standard output:
1 2 | # generate a manifest, the meta data, sic description, of the logical dump pg_restore -l mydatabase.db > mydatabase_manifest.ini |
And like a recipe, the manifest is then used to read and restore the dump following its directions:
1 2 | # restore the dump using the manifest pg_restore -L mydatabase_manifest.ini -d postgres mydatabase.db |
Of course, in between generating and reading the manifest, one takes the opportunity of editing it too:
So let’s use a more realistic i.e. pgbench. The first step initializes the benchmarking database, notice the use of foreign keys:
1 2 3 | dropdb --if-exists pgbench createdb pgbench pgbench -i --foreign-keys -s 50 pgbench |
Let’s complicate the landscape by adding a number of views including a materialized view:
1 2 3 4 5 6 | psql pgbench <<_eof_ create or replace view v1_pgbench_accounts as select * from pgbench_accounts; create or replace view v2_pgbench_accounts as select * from pgbench_accounts; create or replace view v3_pgbench_accounts as select * from pgbench_accounts; create materialized view m_pgbench_accounts as select * from v1_pgbench_accounts order by random() limit 1e3; _eof_ |
And now let’s create the database dump. Remember, manifests can only be generated when the dump has been created using the custom format switch -Fc.
1 | pg_dump -Fc -C pgbench > pgbench.db |
Now, let’s generate the manifest. I’ve opted to use the ini extension to the file name as my code editor recognizes the semicolon as a comment line and colors it accordingly:
1 | pg_restore -C -l pgbench.db > pgbench_manifest.ini |
As per the normal process of generating and restoring a logical dump, one sees from the output listed below the command creating the database, that’s from the -C switch, followed by the views, table definitions, data population, and the constraints, and finishing with the data population of the materialized view:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | ; ; Archive created at 2022-08-24 10:21:46 PDT ; dbname: pgbench ; TOC Entries: 25 ; Compression: -1 ; Dump Version: 1.14-0 ; Format: CUSTOM ; Integer: 4 bytes ; Offset: 8 bytes ; Dumped from database version: 14.0 ; Dumped by pg_dump version: 14.0 ; ; ; Selected TOC Entries: ; 3361; 1262 25332 DATABASE - pgbench postgres 211; 1259 25339 TABLE public pgbench_accounts postgres 213; 1259 25380 VIEW public v1_pgbench_accounts postgres 216; 1259 25392 MATERIALIZED VIEW public m_pgbench_accounts postgres 212; 1259 25342 TABLE public pgbench_branches postgres 209; 1259 25333 TABLE public pgbench_history postgres 210; 1259 25336 TABLE public pgbench_tellers postgres 214; 1259 25384 VIEW public v2_pgbench_accounts postgres 215; 1259 25388 VIEW public v3_pgbench_accounts postgres 3353; 0 25339 TABLE DATA public pgbench_accounts postgres 3354; 0 25342 TABLE DATA public pgbench_branches postgres 3351; 0 25333 TABLE DATA public pgbench_history postgres 3352; 0 25336 TABLE DATA public pgbench_tellers postgres 3200; 2606 25354 CONSTRAINT public pgbench_accounts pgbench_accounts_pkey postgres 3202; 2606 25350 CONSTRAINT public pgbench_branches pgbench_branches_pkey postgres 3198; 2606 25352 CONSTRAINT public pgbench_tellers pgbench_tellers_pkey postgres 3207; 2606 25360 FK CONSTRAINT public pgbench_accounts pgbench_accounts_bid_fkey postgres 3205; 2606 25375 FK CONSTRAINT public pgbench_history pgbench_history_aid_fkey postgres 3203; 2606 25365 FK CONSTRAINT public pgbench_history pgbench_history_bid_fkey postgres 3204; 2606 25370 FK CONSTRAINT public pgbench_history pgbench_history_tid_fkey postgres 3206; 2606 25355 FK CONSTRAINT public pgbench_tellers pgbench_tellers_bid_fkey postgres 3355; 0 25392 MATERIALIZED VIEW DATA public m_pgbench_accounts postgres |
Once generated, the manifest can now be tailored to our requirements which in this case is to restore a subset of the database dump as follows:
- Create database pgbench_restore
- Restore only view v1_pgbench_accounts
- Restore and populate table pgbench_accounts
- Restore and populate materialized view m_pgbench_accounts
You will note that in order to restore table pgbench_accounts it will be necessary to not add all the constraints, i.e. remove the FK constraints.
After some editing here’s what the resulting manifest looks like:
1 2 3 4 5 6 | 211; 1259 25339 TABLE public pgbench_accounts postgres 213; 1259 25380 VIEW public v1_pgbench_accounts postgres 216; 1259 25392 MATERIALIZED VIEW public m_pgbench_accounts postgres 3353; 0 25339 TABLE DATA public pgbench_accounts postgres 3200; 2606 25354 CONSTRAINT public pgbench_accounts pgbench_accounts_pkey postgres 3355; 0 25392 MATERIALIZED VIEW DATA public m_pgbench_accounts postgres |
With a little creativity it’s not that big of a leap to dynamically generate the appropriate manifest:
1 2 3 | GREP="TABLE public pgbench_accounts|VIEW public v1_pgbench_accounts|MATERIALIZED|DATA public pgbench_accounts|pgbench_accounts_pkey|DATA public m_pgbench_accounts" pg_restore -C -l pgbench.db | grep -E "$GREP" > pgbench_manifest.ini |
And now we proceed with the final restoration steps:
1 2 3 4 5 6 7 8 9 10 11 | dropdb --if-exists pgbench_restore createdb pgbench_restore # # take a look at the generated SQL before uploading the logical dump # pg_restore -1 -L pgbench_manifest.ini -f - | less -S # # the logical dump can now be uploaded # pg_restore -1 -L pgbench_manifest.ini -d pgbench_restore pgbench.db |
Taking a look at the resulting database confirms success:
1 | psql pgbench_restore -c 'd' |
1 2 3 4 5 6 | List of relations Schema | Name | Type | Owner | Persistence | Access method | Size --------+---------------------+-------------------+----------+-------------+---------------+--------- public | m_pgbench_accounts | materialized view | postgres | permanent | heap | 144 kB | public | pgbench_accounts | table | postgres | permanent | heap | 640 MB | public | v1_pgbench_accounts | view | postgres | permanent | | 0 bytes | |
And that’s all there is to it!
One final caveat: one edits the manifest by deleting whole lines. It’s not possible to edit the lines themselves, so don’t change anything, nor can new lines be arbitrarily added to the manifest. Remember, manifests are like pointers in that they can only reference what already exists in the database dump.
Thank you for the article. I will use always parameter -Fc with pg_dump every time, now.
But i think, i have found a little mistake.
“pg_restore -1 -L pgbench_manifest.ini -f – | less -S”
Missing dump filename.
Fix:
“pg_restore -1 -L pgbench_manifest.ini -f – pgbench.db | less -S”
Hi Frank,
I’m glad this is something you can use.
The mistake I made was not explaining that this line is what I use to see what my dumps look like by piping it to less.
I love “less”. 🙂