This blog was originally published in July 2023 and was updated in April 2024.
Partitioning is the concept of splitting large tables logically into smaller pieces for better performance of the database. In this article, we’ll discuss how to effectively use traditional partitioning methods in PostgreSQL to effectively split tables and improve database performance.
Methods of built-in PostgreSQL partitioning techniques
- Range partitioning
- List partitioning
- Hash partitioning
When to use partitioning in PostgreSQL
- Bulk operations like data loads and deletes can be performed using the partition feature of ATTACH and DETACH effectively.
- The exact point at which a table will benefit from partitioning depends on the application. However, a rule of thumb is that the size of the table should exceed the physical memory of the database server.
- As data is growing, sub-partitions can be created, which enhances the performance, and also old partitions can be deleted either by making them standalone or dropping them entirely.
Benefits of partitioning in PostgreSQL
- Query performance for DDL and DML operations can be improved in certain situations, particularly when most of the heavily accessed rows of the table are in a single partition or a small number of partitions which is explained below.
- When queries or updates access a large percentage of a single partition, performance can be improved by using a sequential scan of that partition instead of using an index, which would require random-access reads scattered across the whole table.
- Dropping the partition table or truncating the partition table can be done using DROP TABLE and TRUNCATE TABLE, respectively, reducing the load through DELETE operations.
Range partitioning
Range partitioning is a database partitioning method that is based on a specific range of columns with data like dates and Numeric values.
Here, as an example, I created a table with range partitioning and partition tables for each quarter on a Date column.
1 2 | CREATE TABLE employees (id INT NOt NULL , fname VARCHAR(20) , lname VARCHAR (20) , dob DATE NOT NULL , joined DATE NOT NULL) PARTITION BY RANGE (joined); |
1 2 3 4 | CREATE TABLE employees_q1 PARTITION of employees for VALUES FROM ('2022-01-01') to ('2022-04-01'); CREATE TABLE employees_q2 PARTITION of employees for VALUES FROM ('2022-04-01') to ('2022-07-01'); CREATE TABLE employees_q3 PARTITION of employees for VALUES FROM ('2022-07-01') to ('2022-10-01'); CREATE TABLE employees_q4 PARTITION of employees for VALUES FROM ('2022-10-01') to ('2023-01-01'); |
Range partitions are seen below in the table structure.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | d+ employees Partitioned table "public.employees" Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description --------+-----------------------+-----------+----------+---------+----------+-------------+--------------+------------- id | integer | | not null | | plain | | | fname | character varying(20) | | | | extended | | | lname | character varying(20) | | | | extended | | | dob | date | | not null | | plain | | | joined | date | | not null | | plain | | | Partition key: RANGE (joined) Partitions: employees_q1 FOR VALUES FROM ('2022-01-01') TO ('2022-04-01'), employees_q2 FOR VALUES FROM ('2022-04-01') TO ('2022-07-01'), employees_q3 FOR VALUES FROM ('2022-07-01') TO ('2022-10-01'), employees_q4 FOR VALUES FROM ('2022-10-01') TO ('2023-01-01') |
Inserted some random data for entries with 365 days a year.
1 2 3 | INSERT INTO employees (id ,fname,lname,dob ,joined) VALUES ( generate_series(1, 365) ,(array['Oswald', 'Henry', 'Bob', 'Vennie'])[floor(random() * 4 + 1)], (array['Leo', 'Jack', 'Den', 'Daisy' ,'Woody'])[floor(random() * 5 + 1)], '1995-01-01'::date + trunc(random() * 366 * 3)::int, generate_series('1/1/2022'::date, '12/31/2022'::date, '1 day')); |
Range partitioned data is seen as below distributed among its partitions.
1 2 3 4 5 6 7 8 9 10 | SELECT employees_q1 , employees_q2 , employees_q3 , employees_q4 , employees_totalcnt from ( SELECT COUNT(*) FROM employees_q1 ) AS employees_q1, ( SELECT COUNT(*) FROM employees_q2 ) AS employees_q2, ( SELECT COUNT(*) FROM employees_q3 ) AS employees_q3, ( SELECT COUNT(*) FROM employees_q3 ) AS employees_q4 , ( SELECT COUNT(*) FROM employees ) AS employees_totalcnt ; employees_q1 | employees_q2 | employees_q3 | employees_q4 | employees_totalcnt --------------+--------------+--------------+--------------+-------------------- (90) | (91) | (92) | (92) | (365) (1 row) |
Performance of DDL operations
Here, I created a table without a partition and inserted the same data, similar to the partitioned table.
A query plan is seen better for DDL operations when performed on data with a single partition or fewer partitions.
1 | CREATE TABLE employees_nopartition (id INT NOt NULL , fname VARCHAR(20) , lname VARCHAR (20) , dob DATE NOT NULL , joined DATE NOT NULL) ; |
1 2 3 | INSERT INTO employees_nopartition (id ,fname,lname,dob ,joined) VALUES ( generate_series(1, 365) ,(array['Oswald', 'Henry', 'Bob', 'Vennie'])[floor(random() * 4 + 1)], (array['Leo', 'Jack', 'Den', 'Daisy' ,'Woody'])[floor(random() * 5 + 1)], '1995-01-01'::date + trunc(random() * 366 * 3)::int, generate_series('1/1/2022'::date, '12/31/2022'::date, '1 day')); |
1 2 3 4 5 6 | EXPLAIN select * from employees_nopartition where joined >= '2022-05-12' and joined < '2022-06-10'; QUERY PLAN ------------------------------------------------------------------------------ Seq Scan on employees_nopartition (cost=0.00..8.47 rows=29 width=22) Filter: ((joined >= '2022-05-12'::date) AND (joined < '2022-06-10'::date)) (2 rows) |
Here we can see a better query plan when data is fetched from the partitioned table than data fetched from the non-partitioned table.
1 2 3 4 5 6 | EXPLAIN select * from employees where joined >= '2022-05-12' and joined < '2022-06-10'; QUERY PLAN ------------------------------------------------------------------------------ Seq Scan on employees_q2 (cost=0.00..2.37 rows=29 width=22) Filter: ((joined >= '2022-05-12'::date) AND (joined < '2022-06-10'::date)) (2 rows) |
List partitioning
List partitioning is a database partitioning method that is based on key value(s) or discrete values and partition can also be done with the expression of the column like (RANGE BY LIST(expression)), which is explained below:
For example, I created a table with a list partition and a few list-partitioned tables and inserted some random data with 1,000 rows.
1 | CREATE TABLE sales (id INT NOT NULL , branch VARCHAR(3),type text, Amount int ) PARTITION BY LIST (branch); |
1 2 3 4 | CREATE TABLE HYD_sales PARTITION of sales for VALUES IN ('HYD'); CREATE TABLE BLR_sales PARTITION of sales for VALUES IN ('BLR'); CREATE TABLE DEL_sales PARTITION of sales for VALUES IN ('DEL'); CREATE TABLE TPT_sales PARTITION of sales for VALUES IN ('TPT'); |
1 2 | INSERT into sales (id , branch ,type , amount ) VALUES ( generate_series(1, 1000) , (array['HYD', 'BLR', 'DEL', 'TPT'])[floor(random() * 4 + 1)] , (array['Laptops', 'Printers', 'Hardisks', 'Desktops' ,'Monitors'])[floor(random() * 5 + 1)], (random()*200000)::int ); |
List partitions are seen in the table definition below:
1 2 3 4 5 6 7 8 9 10 11 12 13 | d+ sales Partitioned table "public.sales" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description --------+----------------------+-----------+----------+---------+----------+--------------+------------- id | integer | | not null | | plain | | branch | character varying(3) | | | | extended | | type | text | | | | extended | | amount | integer | | | | plain | | Partition key: LIST (branch) Partitions: blr_sales FOR VALUES IN ('BLR'), del_sales FOR VALUES IN ('DEL'), hyd_sales FOR VALUES IN ('HYD'), tpt_sales FOR VALUES IN ('TPT') |
Partitioned data distributed among its partitions is seen below:
1 2 3 4 5 6 7 8 | SELECT blr_sales , del_sales , hyd_sales,tpt_sales, total_cnt from ( SELECT COUNT(*) FROM blr_sales ) AS blr_sales, ( SELECT COUNT(*) FROM del_sales ) AS del_sales, ( SELECT COUNT(*) FROM hyd_sales ) AS hyd_sales, ( SELECT COUNT(*) FROM tpt_sales ) AS tpt_sales , ( SELECT COUNT(*) FROM sales ) AS total_cnt; blr_sales | del_sales | hyd_sales | tpt_sales | total_cnt -----------+-----------+-----------+-----------+----------- (262) | (258) | (228) | (252) | (1001) (1 row) |
List partitioning using expression
For example, I created a table with list partitioning using the expression of a column.
1 2 | CREATE TABLE donors (id INT NOt NULL , name VARCHAR(20) , bloodgroup VARCHAR (15) , last_donated DATE , contact_num VARCHAR(10)) PARTITION BY LIST (left(upper(bloodgroup),3)); |
1 2 3 4 5 6 7 8 | CREATE TABLE A_positive PARTITION of donors for VALUES IN ('A+ '); CREATE TABLE A_negative PARTITION of donors for VALUES IN ('A- '); CREATE TABLE B_positive PARTITION of donors for VALUES IN ('B+ '); CREATE TABLE B_negative PARTITION of donors for VALUES IN ('B- '); CREATE TABLE AB_positive PARTITION of donors for VALUES IN ('AB+'); CREATE TABLE AB_negative PARTITION of donors for VALUES IN ('AB-'); CREATE TABLE O_positive PARTITION of donors for VALUES IN ('O+ '); CREATE TABLE O_negative PARTITION of donors for VALUES IN ('O- '); |
List partitions are seen in the table definition below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | d+ donors Partitioned table "public.donors" Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description --------------+-----------------------+-----------+----------+---------+----------+-------------+--------------+------------- id | integer | | not null | | plain | | | name | character varying(20) | | | | extended | | | bloodgroup | character varying(15) | | | | extended | | | last_donated | date | | | | plain | | | contact_num | character varying(10) | | | | extended | | | Partition key: LIST ("left"(upper((bloodgroup)::text), 3)) Partitions: a_negative FOR VALUES IN ('A- '), a_positive FOR VALUES IN ('A+ '), ab_negative FOR VALUES IN ('AB-'), ab_positive FOR VALUES IN ('AB+'), b_negative FOR VALUES IN ('B- '), b_positive FOR VALUES IN ('B+ '), o_negative FOR VALUES IN ('O- '), o_positive FOR VALUES IN ('O+ ') |
Here, I inserted some random 100 rows.
1 2 3 | INSERT INTO donors (id , name , bloodgroup , last_donated , contact_num) VALUES (generate_series(1, 100) ,'user_' || trunc(random()*100) , (array['A+ group', 'A- group', 'O- group', 'O+ group','AB+ group','AB- group','B+ group','B- group'])[floor(random() * 8 + 1)] , '2022-01-01'::date + trunc(random() * 366 * 1)::int, CAST(1000000000 + floor(random() * 9000000000) AS bigint)); |
List partitioned data with expression distributed among its partitions is seen below:
1 2 3 4 5 6 7 8 9 10 | SELECT a_negative , a_positive , ab_negative , ab_positive , b_negative ,b_positive ,o_negative , o_positive , total_cnt from ( SELECT COUNT(*) FROM a_negative ) AS a_negative, ( SELECT COUNT(*) FROM a_positive ) AS a_positive, ( SELECT COUNT(*) FROM ab_negative ) AS ab_negative, ( SELECT COUNT(*) FROM ab_positive ) AS ab_positive , ( SELECT COUNT(*) FROM b_negative ) AS b_negative, ( SELECT COUNT(*) FROM b_positive ) AS b_positive , ( SELECT COUNT(*) FROM o_positive ) AS o_positive , ( SELECT COUNT(*) FROM o_negative ) AS o_negative, ( SELECT COUNT(*) FROM donors ) AS total_cnt; a_negative | a_positive | ab_negative | ab_positive | b_negative | b_positive | o_negative | o_positive | total_cnt ------------+------------+-------------+-------------+------------+------------+------------+------------+----------- (9) | (19) | (10) | (12) | (12) | (10) | (18) | (10) | (100) (1 row) |
Performance of DML operations
Here is an example shown with the table, which is created without partitions and inserted the same data similar to that of the partitioned table.
Below I created a table without a partition and inserted some random data with 1,000 rows to show query performance.
1 | CREATE TABLE sales_nopartition (id INT NOT NULL , branch VARCHAR(3),type text, Amount int ); |
1 2 3 | INSERT into sales_nopartition (id , branch ,type , amount ) VALUES ( generate_series(1, 1000) , (array['HYD', 'BLR', 'DEL', 'TPT'])[floor(random() * 4 + 1)] , (array['Laptops', 'Printers', 'Hardisks', 'Desktops' ,'Monitors'])[floor(random() * 5 + 1)], (random()*200000)::int ); |
UPDATE Query Performance
1 2 3 4 5 6 7 | EXPLAIN update sales_nopartition set type = 'Smart Watches' where branch = 'HYD'; QUERY PLAN --------------------------------------------------------------------------- Update on sales_nopartition (cost=0.00..19.50 rows=229 width=50) -> Seq Scan on sales_nopartition (cost=0.00..19.50 rows=229 width=50) Filter: ((branch)::text = 'HYD'::text) (3 rows) |
1 2 3 4 5 6 7 8 | EXPLAIN update sales set type = 'Smart Watches' where branch = 'HYD'; QUERY PLAN ------------------------------------------------------------------ Update on sales (cost=0.00..5.10 rows=248 width=50) Update on hyd_sales -> Seq Scan on hyd_sales (cost=0.00..5.10 rows=248 width=50) Filter: ((branch)::text = 'HYD'::text) (4 rows) |
DELETE Query Performance
1 2 3 4 5 6 7 | EXPLAIN DELETE from sales_nopartition where branch='HYD'; QUERY PLAN -------------------------------------------------------------------------- Delete on sales_nopartition (cost=0.00..19.50 rows=229 width=6) -> Seq Scan on sales_nopartition (cost=0.00..19.50 rows=229 width=6) Filter: ((branch)::text = 'HYD'::text) (3 rows) |
1 2 3 4 5 6 7 8 | EXPLAIN DELETE from sales where branch='HYD'; QUERY PLAN ----------------------------------------------------------------- Delete on sales (cost=0.00..5.10 rows=248 width=6) Delete on hyd_sales -> Seq Scan on hyd_sales (cost=0.00..5.10 rows=248 width=6) Filter: ((branch)::text = 'HYD'::text) (4 rows) |
The above examples show the performance of DELETE and UPDATE operations with data fetched from a single partitioned table having a better query plan than the one with no partitions.
Hash partitioning
A hash partitioning table is defined as the table partitioned by specifying a modulus and a remainder for each partition.
- Each partition will hold the rows for which the hash value of the partition key divided by the specified modulus will produce the specified remainder.
- Hash partitioning is best used when each partition is on different table spaces residing on separate physical disks, so the IO is equally divided by more devices.
For example, I created a table with hash partitioning and a few partitioned tables with modulus five.
1 | CREATE TABLE students ( id int NOT NULL, name varchar(30) NOT NULL , course varchar(100) ,joined date ) PARTITION BY hash(id); |
1 2 3 4 5 | CREATE TABLE student_0 PARTITION OF students FOR VALUES WITH (MODULUS 5,REMAINDER 0); CREATE TABLE student_1 PARTITION OF students FOR VALUES WITH (MODULUS 5,REMAINDER 1); CREATE TABLE student_2 PARTITION OF students FOR VALUES WITH (MODULUS 5,REMAINDER 2); CREATE TABLE student_3 PARTITION OF students FOR VALUES WITH (MODULUS 5,REMAINDER 3); CREATE TABLE student_4 PARTITION OF students FOR VALUES WITH (MODULUS 5,REMAINDER 4); |
The table structure looks like the one below with five created partitions:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | d+ students Partitioned table "public.students" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description --------+------------------------+-----------+----------+---------+----------+--------------+------------- id | integer | | not null | | plain | | name | character varying(30) | | not null | | extended | | course | character varying(100) | | | | extended | | joined | date | | | | plain | | Partition key: HASH (id) Partitions: student_0 FOR VALUES WITH (modulus 5, remainder 0), student_1 FOR VALUES WITH (modulus 5, remainder 1), student_2 FOR VALUES WITH (modulus 5, remainder 2), student_3 FOR VALUES WITH (modulus 5, remainder 3), student_4 FOR VALUES WITH (modulus 5, remainder 4) |
Here, I Inserted some random data with 100,000 rows.
1 2 | INSERT into students (id , name , course ,joined ) VALUES (generate_series(1, 100000) , 'student_' || trunc(random()*1000) , (array['Finance & Accounts', 'Business Statistics', 'Environmental Science'])[floor(random() * 3 + 1)],'2019-01-01'::date + trunc(random() * 366 * 3)::int); |
We see below the hash partitioned data among its partitioned tables.
1 2 3 4 5 6 7 8 9 | SELECT relname,reltuples as rows FROM pg_class WHERE relname IN ('student_0','student_1','student_2','student_3','student_4') ORDER BY relname; relname | rows -----------+------- student_0 | 19851 student_1 | 20223 student_2 | 19969 student_3 | 19952 student_4 | 20005 (5 rows) |
Benefits of hash partitioning
- The primary benefit is to ensure an even distribution of data among a predetermined number of partitions.
- Hash keys are used effectively and efficiently in cases where ranges are not applicable, like employee number, product ID, etc.
What if the data is out of range or list?
For this purpose, we use default partitions on range and list partitioned tables.
For both range and list partitions, data can be stored temporarily, which is out-of-range, by creating a default partition and later creating an appropriate partition.
Hash-partitioned tables may not have a default partition, as the creation of a default partition for hash partitioning does not make any sense and is not needed.
We see here what happens when I try to insert data for which a partition doesn’t exist and how the default partition helps in this case.
1 2 3 | INSERT into sales VALUES ( 1001 , 'MYS' , 'Scanners' , 190000); ERROR: no partition of relation "sales" found for row DETAIL: Partition key of the failing row contains (branch) = (MYS). |
1 2 3 4 5 | CREATE TABLE sales_default PARTITION of sales DEFAULT; CREATE TABLE INSERT into sales VALUES ( 1001 , 'MYS' , 'Scanners' , 190000); INSERT 0 1 |
1 2 3 4 5 | select * from sales_default ; id | branch | type | amount ------+--------+----------+-------- 1001 | MYS | Scanners | 190000 (1 row) |
So the data we inserted is sent to the default partition, and partitions can be created later based on the data in the default table and available partitions.
Learn More About PostgreSQL Partitioning with our Free Resources
Here, we discussed default partitioning techniques in PostgreSQL using single columns and how to create multi-column partitioning. PostgreSQL Partition Manager(pg_partman) can also be used to create and manage partitions effectively. Further details will be explained in upcoming blogs.
Also, please find below the related blogs for reference:
PostgreSQL Sharding: An Overview and MongoDB Comparison
Performing ETL Using Inheritance in PostgreSQL
Partitioning in PostgreSQL With pg_partman (Serial-Based & Trigger-Based)
PostgreSQL Partitioning Made Easy Using pg_partman (TimeBased)
Our PostgreSQL Performance Tuning guide condenses years of database expertise into a practical guide for optimizing your PostgreSQL databases. Get it today:
Elevate your PostgreSQL Performance
FAQs
What is the difference between partitioning and indexing in PostgreSQL?
In PostgreSQL, partitioning involves dividing a large table into smaller, more manageable pieces, known as partitions, based on certain criteria like ranges or list values. This can significantly improve query performance on large datasets and simplify data management. Indexing, on the other hand, involves creating a data structure that improves the speed of data retrieval operations on a table. While indexing speeds up query processing by providing quick lookups, partitioning optimizes performance by limiting the number of rows to scan.
What is the difference between sharding and partitioning in PostgreSQL?
Sharding and partitioning in PostgreSQL both deal with distributing data but in subtly different ways. Sharding typically refers to distributing data across multiple databases or servers, potentially across different physical locations, to balance load and improve performance. Partitioning, however, occurs within a single database and involves dividing a table into segments that are easier to manage and query. While both techniques can help with scaling, sharding is more about the horizontal distribution of data across a cluster, whereas partitioning is about organizing data within a database.
When is it most appropriate to partition a database?
Partitioning a database is most appropriate when dealing with large tables that significantly impact query performance and maintenance tasks and is particularly beneficial when a database frequently executes queries that scan large segments of data based on specific keys or ranges, such as dates or geographic locations. Partitioning can also be advantageous when archival operations are routine, allowing for easier management of historical data by segregating older entries into separate partitions.