DockerThis blog post reviews the appropriateness of Docker and other container solutions for your database environment.

A few weeks back, I wrote a fairly high-level blog post about containers. It covered what you should consider when thinking about using Docker, rkt, LXC, etc. I hope you’ve taken the chance to give it a quick read. It’s a good way to understand some of the disciplines you need to consider before moving to new technology. However, it sparked a conversation in our Solutions Engineering team. Hopefully, the same one that you’re having in your organization: should customers run their database in containers?

Before we start, I’ll admit that Percona uses containers. Percona Monitoring and Management (PMM for short) presents all of the pretty graphs and query analytics by running in a Docker container. We made that choice because the integration between the components is where we could provide the most value to users. Docker lets us distribute a single ready-to-go unit of awesomeness. In short, it has huge potential on the application side of your environment. 

However, for databases… here are some of our recommendations:

Quick n Dirty

Decision = NOT FOR DBs (as it sits right now)

This is not the case for every environment. It is the default that we think is the best recommendation for the majority of our customers. Please note, that I am only making this recommendation for your database. If you’re using microservices for your application today, then it could make more sense to containerize your database depending on the load characteristics of your database, your scaling needs and the skill set you currently have.

Why?

Lack of Synergy

Before you decide to shoot me, please take some time to understand where we’re coming from. First of all, people designed container solutions to deal with stateless applications that have ephemeral data. Containers spin up a quick microservice and then destroy it. This includes all the components of that container (including its cache and data). The transient nature of containers is because all of the components and services of that container are considered to be part of the container (essentially it’s all or nothing). Serving the container a data volume owned by the underlying OS by punching a hole through the container can be very challenging. Current methods are too unreliable for most databases.

Most of the development efforts put into the various solutions had one goal in mind: statelessness. There are solutions that can help keep your data persistent, but they are very quickly evolving. From what we can tell, they require a high level of complexity, that negate any efficiency gains due to increased operational complexity (and risk). To further my point, this is precisely the conclusion that we’ve come to time and again when we’ve reviewed any “real world” information about the use of containers (especially Docker).

They’re Just Not Stable Yet

These container solutions are meant for quick development and deployment of applications that are broken into tiny components: microservices. Normally, these applications evolve very quickly in organizations that are very software/developer-driven. That seems to be how these container solutions (again, especially Docker) are developed as well. New features are pushed out with little testing and design. The main focus seems to be the latest feature set and being first to market. They “beg for forgiveness” instead of “ask for permission.” On top of that, backward compatibility (from what we can tell) is a distant concern (and even that might be an overstatement). This means that you’re going to have to have a mature Continuous Delivery and testing environment as well as a known and tested image repository for your containers.

These are awesome tools to have for the right use cases, but they take time, money, resources and experience. In speaking with many of our customers, this is just not where they’re at as an organization. Their businesses aren’t designed around software development, and they simply don’t have the checkbooks to support the resources needed to keep this hungry machine fed. Rather, they are looking for something stable and performant that can keep their users happy 24×7. I know that we can give them a performant, highly-available environment requires much less management if we strip out containers.

Is There Hope?

Absolutely, in fact, there’s a lot more than hope. There are companies running containers (including databases) at massive scale today! These are the types of companies that have very mature processes. Their software development is a core part of their business plan and value proposition. You probably know who I’m talking about: Uber, Google, Facebook (there are more, these are just a few). There’s even a good rundown of how you can get persistence in containers from Joyent. But as I said before, the complexity needed to get the basic features necessary to keep your data alive and available (the most basic use of a database) is much too high. When containers have a better and more stable solution for persistent storage volumes, they will be one step closer to being ready, in my opinion. Even then, containerizing databases in most organizations that aren’t dealing with large scale deployments (50+ nodes) with wildly varying workloads is probably unnecessary.

Don’t’ Leave Us Hanging…

I realize that the statement “you’re probably not ready to containerize your database” does not constitute a solution. So here it is: the Solutions Engineering team (SolEng for short) has you covered. Dimitri Vanoverbeke is in the process of a great blog series on configuration management. Configuration management solutions can greatly increase the repeatability of your infrastructure, and make sure that your IT/App Dev processes are repeatable in the physical configuration of your environment. Automating this process can lead to great gains. However, this should make use of a mature development/testing process as part of your application development lifecycle. The marriage of process and technology creates stable applications and happy customers.

Besides configuration management as an enhanced solution, there are some services that can make the life of your operations team much easier. Service discovery and health checking come to mind. My favorite solution is Consul, which we use extensively in PMM for configuration and service metadata. Consul can make sure that your frontend applications and backend infrastructure are working from a real-time snapshot of the state of your services.

Conclusion

There is a lot to think about when it comes to managing an environment, especially when your application develops at a quick pace. With the crafty use of available solutions, you can reduce the overhead that goes into every release. On top of that, you can increase resiliency and availability. If you need our help, please reach out. We’d love to help you!

9 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Dave Anselmi

You make some good points regarding how typical Docker usage (microservices, quick create/destroy, etc) are not good wins for database deployment. Too much fragility of the infra is verboten for production database deployments.

However, if the container is designed for simplifying installation, i.e. a ‘containerized base-image’, and will be deployed as a single container per node, containers can be an excellent solution for DBA and DevOps simplicity.

With the right container orchestration, an otherwise tricky multi-node clustered database deployment can be reduced to a series of clicks.

In our experience, containers have been very stable for our customers’ production deployments (we’re using kubernetes containers, one container per node, on Amazon and bare-metal).
Dave A. Anselmi, Director of Product, ClustrixDB

eric Vanier

I will say that “Docker” or “Container” are good so far for deployment point of view only, I don’t see any of my clients using them for production used .

Ruichao Lin

Uber & 乐视 using docker for Database online.

Programmer

So what? But we use. )))

BLAH

Use a host volume mount for persistent data. How is this complex?

Peter Zaitsev

By itself it is not complex. It is failure scenarios where it becomes a lot more complex especially with classical MySQL replication where the node roles are not symmetrical

Darrell Breeden

Agreed. A volume is a single host solution unless you have shared backing storage (such as block storage / or File share). You can also use gluster to replicate, but replicating database systems in that manner would probably actually cripple gluster.

For testing scenarios its great, and devops vagrant setup, but I have not found a good use for databases in containers in production. There are too many management gains from having them separated and managed by a separate group.

ovaistariq

I don’t think I would rule out Docker for databases completely. We (Uber) are running it successfully in production with thousands of database instances co-located. Docker has its benefits and has its issues as well. However, if you have well engineered automation and knowledge to support it, then it can simplify operations and make infrastructure more efficient. The added benefits of immutable infrastructure and being able to have the developers mimic production environment on their workstations are very difficult to just look over.

Nawaz

Is this still valid considering the recent developments for docker and enhancements?