Comments on: Why you can’t rely on a replica for disaster recovery

By: crownedgrouse

crownedgrouse — Thu, 07 Oct 2010 14:53:04 +0000

Hi,
A replication is not a backup. It is aimed to provide High availability (HA) , not disaster recovery (DR).

To be safe (the minimum !) :
1) use good harware. Mainly by doubling all devices (power, etc …).
2) use RAIDx disks on a SAN (be careful on the ‘write back’ or ‘write through’ parameter !)
3) use ZFS if possible, or at least a journalized FS.
4) use ACID compliant database (postgresql …), use internal replication feature if possible, for HA and DR (see 6)
5) snapshots can be usefull sometimes, but rarely for databases because data can be inconsistent.
Don’t snapshot database directories, except if you can tell the database that you are backuping or snapshoting the data. (Postgresql allow it : SELECT pg_start_backup(‘label’); )
ZFS allow powerfull snapshots that can be sent to another remote node…
6) do regular database dumps in a (compressed) file for snapshots and backups. That can be done on a replicated slave database for performance.
7) do complete and incremental backups on tapes, disks and optical disks. Protect it in fireproof vault.
8) use system versioning for system configuration (/etc/, …) and protect it (tripwire …)
9) duplicate the whole in two datacenters in two different cities, at least, or two continents if possible.
cross the backups : Datacenter A is backuped on datacenter B and the contrary.
10) test data recovery and your disaster recovery plan (imply you wrote it :>) …)

And pray… if you are believer.

By: Baron Schwartz

Baron Schwartz — Mon, 16 Aug 2010 18:13:05 +0000

InnoDB does support raw devices, and if used, then this type of problem is avoided. But it’s kind of nice to be able to use “cp” and “mv” and other tools. Most people don’t want to give that up. “Complete and total control” comes with limitations.

By: Andrew Watson

Andrew Watson — Mon, 16 Aug 2010 17:59:03 +0000

Why doesn’t MySQL support raw devices like Oracle? Then you don’t have to rely on a file system driver for anything! you write the bits out to the device yourself and have complete and total control over that device.

By: Baron Schwartz

Baron Schwartz — Mon, 16 Aug 2010 12:51:18 +0000

Read previous comments, please.

By: Andreas

Andreas — Mon, 16 Aug 2010 10:51:40 +0000

InnoDB got a transaction log. Isn’t this log responsible for recovering any glitches after a power outage?

By: Baron Schwartz

Baron Schwartz — Mon, 16 Aug 2010 10:45:22 +0000

In reply to Florian Haas. The unreliability of ext2 is just one specific example of a larger principle, in my opinion.

By: Florian Haas

Florian Haas — Mon, 16 Aug 2010 09:48:50 +0000

I know I’m a bit late in responding to this, but isn’t this really about “why you can’t rely on ext2 for anything”? I think pretty much everyone agrees that using a non-journaling file system is a really bad idea if you want any resilience in the face of power outage at all, regardless of whether any sort of replication is involved.

Just my $.02.

By: Baron Schwartz

Baron Schwartz — Sun, 08 Aug 2010 03:05:33 +0000

Given a properly behaving operating environment, InnoDB does not corrupt your data, and it is better than most at detecting and recovering from many types of misbehavior. But if the database writes and fsync()s a file, and the filesystem botches the result good and proper, there is nothing the database CAN do.

For a different perspective on this topic, I suggest reading http://www.xaprb.com/blog/2010/02/08/how-postgresql-protects-against-partial-page-writes-and-data-corruption/.

By: Patrick Casey

Patrick Casey — Sun, 08 Aug 2010 01:37:18 +0000

Andreas, I think it actually depends on both the DB and the fileystem. The most common linux file systems like ext3 are journaling file systems and as such they very, very, very rarely get corrupted during a hard power off, but it does happen. If your file system gets corrupted, it really doesn’t matter what kind of shape the database is in because the underlying FS is munged and you can’t get your blocks back.

In practise, I can’t recall a snap of mine ever failing a recovery on EXT3, buts its not outside the realm of possibility.

Point being I suppose is that modern file system almost always survive a crash, so we tend to take that kind of durability for granted, but it really isn’t. The fact that a server comes back at all after an abrupt power loss is a testment to some very careful file system design. The fact that a database running on top of that file system recovers as well is still more good engineering, but both have to work to get your data back :).

By: Andreas

Andreas — Sun, 08 Aug 2010 01:03:39 +0000

Quote: A snapshot on the SAN is just the same as cutting the power to the machine â€” the block device is in an inconsistent state.

So you are telling me i can loose all my data just by cutting power?
I won’t blame the filesystem here, i will blame the database.

By: Ed Walker

Ed Walker — Fri, 06 Aug 2010 05:14:29 +0000

Another beauty of mysql replication that’s unfortunately also a curse — it’s single threaded. Obscure bugs hit in Linux or mysql due to threads tripping over themselves just don’t happen on the slave. I’m pretty sure this fact has saved me at least once. (The curse is that mysql replication has no hope of keeping up with a busy master.)

By: Ed Walker

Ed Walker — Fri, 06 Aug 2010 04:51:25 +0000

The beauty of mysql replication is that the db copy is “hot” — there’s always a functioning database running against the copy. You can run queries, CHECK TABLEs, whatever you want while replication is proceeding to assure yourself the thing is going to work when it’s needed. Not true of low-level replication schemes.

I second that backing up by “snapshotting” a hot innodb db works. You just let the copy run crash recovery and voila you have a consistent backup copy of your database. I don’t use “snapshots” per se. My experience with LVM snapshots has been disappointing in terms of performance. I use RAID-10 and split mirrors.

By: Nils

Nils — Thu, 05 Aug 2010 00:35:04 +0000

Also seems to me like a typical case of “we bought this expensive SAN so our data is safe”.

By: Jason J. W. Williams

Jason J. W. Williams — Wed, 04 Aug 2010 02:36:34 +0000

In case it’s not a completely horrible setup…and useful for someone else.

We run master/slave MySQL replication…ZFS is the FS for both master and slave. We snapshot the filesystems on master and slave regularly. 3 times a week we copy one of those snaps off to a set of DataDomain backup servers.

By: Baron Schwartz

Baron Schwartz — Mon, 02 Aug 2010 16:29:59 +0000

Patrick, this customer was using ext2. I’ve never seen a problem with this kind of corruption on ext3, but ext2 doesn’t even TRY to handle things like remote snapshots.

By: Patrick Casey

Patrick Casey — Mon, 02 Aug 2010 16:21:12 +0000

Its worth pointing out at this point that my experience with snapping a hot INNODB system on EXT3 at least is that it *almost always* works.

We snap hot databases, mount the snaps elsewhere, and run a recovery all the time both via LVM and storage snaps (netapp).

In point of fact, I can’t recall ever having a snap we took this way fail the recovery process.

I keep reading about cases like this where it didn’t work, so I’m skeptical about proposals to use storage snaps in lieu of backups (because, inevitably, the snap you need will inevitable be the one that doesn’t recover), but as a general case tool to quickly clone databases, I think its phenomenally useful and infinitely faster than a dump/restore.

Point being I suppose that its entirely possible the customer in this case did do testing on this, even under heavy load, and it worked fine in testing. Doesn’t mean it was a bad test, probably more a failure of imagination (or research).

By: Baron Schwartz

Baron Schwartz — Mon, 02 Aug 2010 13:31:33 +0000

A test restore, under production load. I’d guess they did a test — maybe before putting into production, though. If the system were quiet, pdflush would flush blocks to the SAN every 5 seconds, and soon the SAN’s copy of the data would be consistent too.

By: Morgan Tocker

Morgan Tocker — Mon, 02 Aug 2010 13:20:38 +0000

A backup isn’t a backup unless you’ve tried a recovery 😉 It is possible the ex2 weaknesses could have been discovered with a test restore.

By: Patrick Casey

Patrick Casey — Sun, 01 Aug 2010 15:23:30 +0000

If you’re looking at a backup solution there’s two (common) things you try to defense against:

1) The primary server eating itself/disappearing into a puff of smoke
2) Some sort of logical error in the application or database corruption your data even though the server is technically just fine

Replication solves #1 *really well*.
Replication does not solve #2 at all since most common logical errors are happily replicated. As some have mentioned, running a delayed slave can mitigate this, but that only works if you notice the problem before the slave experiences it.

Traditional backups solve #1 weakly since the restore time on a large database can be prohibitive
Traditional backups solve #2 really well.

In the best of all possible worlds I usually recommend we run both, but depending on the business needs it doesn’t always make economic sense to run replication + backups.
In the enterprise environments I’m most familiar with the usual approach is to *always* run traditional backups, since they give absolute protection against both scenarios above, albeit with a high recovery time. For systems where we can’t afford the recovery time, we’ll also add in replication so that we can recover from some subset of failures quickly.

By: Tom

Tom — Sun, 01 Aug 2010 11:11:40 +0000

And this is why we use multiple off-site incremental backups and only do full replication periodically (without deleting old replicas). Something got messed up? Revert to a stable replica and apply some increments until you get the most recent stable version.