Partitioning, Free Lunches, and Indexing

Why partition?
Partitioning is a commonly touted method for achieving performance in MySQL and other databases. (See here, here, here and many other examples.) I started wondering where the performance from partitions comes from, and I’ve summarized some of my thoughts here.
But first, what is partitioning? (I’ve taken the examples from Giuseppe Maxia’s Partitions in Practice intro.)
CREATE TABLE by_year ( d DATE ) PARTITION BY RANGE (YEAR(d)) ( PARTITION P1 VALUES LESS THAN (2001), PARTITION P2 VALUES LESS THAN (2002), PARTITION P3 VALUES LESS THAN (2003), PARTITION P4 VALUES LESS THAN (MAXVALUE) )
1
2
3
4
5
6
7
8
9
10
CREATE TABLE by_year (
   d DATE
)
PARTITION BY RANGE (YEAR(d))
(
   PARTITION P1 VALUES LESS THAN (2001),
   PARTITION P2 VALUES LESS THAN (2002),
   PARTITION P3 VALUES LESS THAN (2003),
   PARTITION P4 VALUES LESS THAN (MAXVALUE)
)
In this simple example, you get four partitions, depending on the year component of the date field. This is a simple example, so it’s easy to see what’s going on without getting bogged down in the increased complexity. But increased complexity is a fact of life with partitioning — What happens when you want a per-week partition for the last two years and need to manage 104 partitions? Yuck! — so there needs to be a compelling reason to partition. Which brings us to the question: What are partitions good for?

Bulk Deleting

If you’ve partitioned your table, then it is possible to drop a partition, and thus delete a chunk of data very quickly. For example

ALTER TABLE by-year DROP PARTITION P1;

1	ALTER TABLE by-year DROP PARTITION P1;

B-tree-based storage engines, like InnoDB, are slow to delete data, and deletions fragment the table, with consequences to performance. Dropping a partition makes deletions fast and does not fragment the table. On the other hand, for Fractal Tree storage engines, like TokuDB, deletions are fast — though not as fast as dropping a partition — and Fractal Tree indexes don’t fragment.

So partition dropping gives a clear benefit, if you’re going to use InnoDB. In the case of Fractal Tree indexes, instant partition dropping is still nice, but the case for partitions is not as strong, since it’s not clear that complexity is worth it.

No Free Lunches or The Fallacy of “Divide and Conquer”

Leaving deletions aside, the most common reason I see mentioned for using partitions is the claim that partitioning breaks big tables into smaller tables. Smaller tables are faster, right? Well, no. A collection of small tables that you have to swap in and out is not necessarily faster than a single table, parts of which you have to swap in and out. Contrary to popular belief, partitioning is not an example of “Divide and Conquer”. [For Divide and Conquer to work, you split up a problem and then the solutions to each piece are independent of each other. Virtual memory means that the “solutions” to managing the various indexes of the partitioned pieces are not independent.]

I think of it this way: you are just swapping one data structure on the table for another. In one case, you have a single table managed by a B-tree, Fractal Tree index, or whatever, and you have virtual memory moving parts of the structure in and out of memory. In the other case, you have a hybrid data structure, with trees at the bottom, and whatever MySQL uses to manage partitions at the top. And virtual memory still moves parts of this hybrid structure in and out of memory.

More concretely: suppose you partition a table into pieces, each of which fits in memory. That should be good, right? Well, let’s consider two extreme cases. First, suppose that your insertion/query use case hammers on one partition at a time. You swap in that partition, and you run fast because everything’s in memory. But if you hadn’t partitioned, you’d still swap in the active part of the table into memory and go fast.

At the other extreme, suppose you jump from partition to partition. Then you have to keep swapping in bits of different partitions. Without partitioning, you keep swapping in bits of the index tree.

In short, there’s no such thing as a free lunch. Partitioning just replaces one data structure on a table for another, and there’s no reason to believe that the new data structure is better.

So what’s the main advantage of partitioning?

This is what the next installment will be about. Stay tuned.

1 Comment

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Justin Swanhart

13 years ago

If you have a table with a long tail, which is rarely deleted from and old data is rarely accessed, then partitioning keeps the indexes on the hot partitions smaller which is results in better performance. This is quite common, and the situation in which I most often suggest partitioning.

You can also use innodb_file_per_table w/ symlinks (really only safe with the facebook patch) to place partitions on different IO subsystems. If you execute queries in parallel (see http://code.google.com/p/shard-query) then you can get much better performance using partitions than not using them. This is an example of ‘divide and conquer’.

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Partitioning, Free Lunches, and Indexing

Bulk Deleting

No Free Lunches or The Fallacy of “Divide and Conquer”

So what’s the main advantage of partitioning?

Related

Related Blog Articles

RECOMMENDED ARTICLES

Can We Set up a Replicate Filter Within the Percona XtraDB Cluster?

Choosing the Right Database: Comparing MariaDB vs. MySQL, PostgreSQL, and MongoDB

Seamless Table Modifications: Leveraging pt-online-schema-change for Online Alterations

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Partitioning, Free Lunches, and Indexing

Bulk Deleting

No Free Lunches or The Fallacy of “Divide and Conquer”

So what’s the main advantage of partitioning?

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

Can We Set up a Replicate Filter Within the Percona XtraDB Cluster?

Choosing the Right Database: Comparing MariaDB vs. MySQL, PostgreSQL, and MongoDB

Seamless Table Modifications: Leveraging pt-online-schema-change for Online Alterations

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation