SELinux for PXC security

Why do I spend time blogging about security frameworks? Because, although there are some resources available on the Web, none apply to Percona XtraDB Cluster (PXC) directly. Actually, I rarely encounter a MySQL setup where SELinux is enforced and never when Percona XtraDB Cluster (PXC) or another Galera replication implementation is used. As we’ll see, there are good reasons for that. I originally thought this post would be a simple “how to” but it ended up with a push request to modify the SST script and a few other surprises.

Some context

These days, with all the major security breaches of the last few years, the importance of security in IT cannot be highlighted enough. For that reason, security in MySQL has been progressively tightened from version to version and the default parameters are much more restrictive than they used to be. That’s all good but it is only at the MySQL level if there is still a breach allowing access to MySQL, someone could in theory do everything the mysql user is allowed to do. To prevent such a situation, the operations that mysqld can do should be limited to only what it really needs to do. SELinux’ purpose is exactly that. You’ll find SELinux on RedHat/Centos and their derived distributions. Debian, Ubuntu and OpenSuse uses another framework, AppArmor, which is functionally similar to SELinux. I’ll talk about AppArmor in a future post, let’s focus for now on SELinux.

The default behavior of many DBAs and Sysadmins appears to be: “if it doesn’t work, disable SELinux”. Sure enough, it often solves the issue but it also removes an important security layer. I believe disabling SELinux is the wrong cure so let’s walk through the steps of configuring a PXC cluster with SELinux enforced.

Starting point

As a starting point, I’ll assume you have a running PXC cluster operating with SELinux in permissive mode. That likely means the file “/etc/sysconfig/selinux” looks like this:

For the purpose of writing this article, I created a 3 nodes PXC cluster with the hosts: BlogSELinux1, BlogSELinux2 and BlogSELinux3. On BlogSELinux1, I set SELinux in permissive mode, I truncated the audit.log. SELinux violations are logged in the audit.log file.

Let’s begin by covering the regular PXC operation items like start, stop, SST Donor, SST Joiner, IST Donor and IST Joiner. As we execute the steps in the list, the audit.log file will record SELinux related elements.

Stop and start

Those are easy:

SST Donor

On BlogSELinux3:

then on BlogSELinux2:

SST Joiner

We have BlogSELinux1 and BlogSELinux2 up and running, we just do:

IST Donor

We have BlogSELinux1 and BlogSELinux2 up and running, we just do:

Then on the first node:

Those statements put some data in the gcache, now we just restart the second node:

IST Joiner

We have BlogSELinux1 and BlogSELinux2 up and running, we just do:

Then on the second node:

to insert some data in the gcache and we restart the first node:

First run

Now that we performed the basic operations of a cluster while recording the security violations in permissive mode, we can look at the audit.log file and start building the SELinux policy. Let’s begin by installing the tools needed to manipulate the SELinux audit log and policy files with:

Then, we’ll use the audit2allow tool to analyze the audit.log file:

We end up with 2 files, PXC.te and PXC.pp. The pp file is a compiled version of the human readable te file. If we examine the content of the PXC.te file, at the beginning, we have the require section listing all the involved SELinux types and classes:

Then, using these types and classes, the policy file adds a series of generic allow rules matching the denied found in the audit.log file. Here’s what I got:

I can understand some of these rules. For example, one of the TCP ports used by Kerberos is 4444 and it is also used by PXC for the SST transfer. Similarly, MySQL needs to write to /tmp. But what about all the other rules?

Troubleshooting

We could load the PXC.pp module we got in the previous section and consider our job done. It will likely allow the PXC node to start and operate normally but what exactly is happening? Why did MySQL or one of its subprocesses asked for the process attributes getattr of all the running processes like sshd, syslogd and cron. Looking directly in the audit.log file, I found many entries like these:

So, ss, a network utility tool, scans all the processes. That rang a bell… I knew where to look for, the sst script. Here’s the source of the problem in the wsrep_sst_xtrabackup-v2 file:

This bash function is used when the node is a joiner and it checks using ss if the TCP port used by socat or nc is opened. The check is needed in order to avoid replying too early with the “ready” message. The code is functionally correct but wrong, security wise. Instead of looking if there is a socat or nc command running in the list of processes owned by the mysql user, it checks if any of the processes has opened the SST port and only then does it checks if the name of the command is socat or nc. Since we don’t know which processes will be running on the server, we can’t write a good security profile. For example, in the future, one could add the ntpd daemon, causing PXC to fail to start yet again. To avoid that, the function needs to be modified like this:

The modified function removes many of the denied messages in the audit log file and simplifies a lot the content of PXC.te. I tested the above modification and made a pull request to PXC. Among the remaining items, we have:

setpgid is called often used after a fork to set the process group, usually through the setsid call. MySQL uses fork when it starts with the daemonize option but our installation of Percona XtraDB cluster uses mysqld_safe and does not directly run as a daemon. Another fork call is part of the wsrep source files and is used to launch processes like the SST script and is done when mysqld is already running with reduced privileges. This later invocation is certainly our culprit.

TCP ports

What about TPC ports? PXC uses quite a few. Of course there is the 3306/tcp port used to access MySQL. Galera also uses the ports 4567/tcp for replication, 4568/tcp for IST and 4444/tcp for SST. Let’s have a look which ports SELinux allows PXC to use:

No surprise, port 3306/tcp is authorized but if you are new to MySQL, you may wonder what uses the 1186/tcp. It is the port used by NDB cluster for inter-node communication (NDB API). Now, if we try to add the missing ports:

4568/tcp was successfully added but, 4444/tcp and 4567/tcp failed because they are already assigned to another security context. For example, 4444/tcp belongs to the kerberos security context:

A TCP port is not allowed by SELinux to belong to more than one security context. We have no other choice than to move the two missing ports to the mysqld_t security context:

If you happen to be planning to deploy a Kerberos server on the same servers you may have to run PXC using a different port for Galera replication. In that case, and in the case where you want to run MySQL on a port other than 3306/tcp, you’ll need to add the port to the mysqld_port_t context like we just did above. Do not worry too much for the port 4567/tcp, it is reserved for tram which, from what I found, is a remote access protocol for routers.

Non-default paths

It is very frequent to run MySQL with non-standard paths/directories. With SELinux, you don’t list the authorized path in the security context, you add the security context labels to the paths. Adding a context label is a two steps process, basically change and apply. For example, if you are using /data as the MySQL datadir, you need to do:

On a RedHat/Centos 7 server, the MySQL file contexts and their associated paths are:

If you want to avoid security issues with SELinux, you should stay within those paths. A good example of an offending path is the PXC configuration file and directory which are now located in their own directory. These are not labeled correctly for SELinux:

I must admit that even if the security context labels on those files were not set, I got no audit messages and everything worked normally. Nevetheless, adding the labels is straightforward:

Variables check list

Here is a list of all the variables you should check for paths used by MySQL

  • datadir, default is /var/lib/mysql, where MySQL stores its data
  • basedir, default is /usr, where binaries and librairies can be found
  • character_sets_dir, default is basedir/share/mysql/charsets, charsets used by MySQL
  • general_log_file, default is the datadir, where the general log is written
  • init_file, no default, sql file read and executed when the server starts
  • innodb_undo_directory, default is datadir, where InnoDB stores the undo files
  • innodb_tmpdir, default is tmpdir, where InnoDB creates temporary files
  • innodb_temp_data_file_path, default is in the datadir, where InnoDB creates the temporary tablespace
  • innodb_parallel_doublewrite_path, default is in the datadir, where InnoDB created the parallel doublewrite buffer
  • innodb_log_group_home_dir, default is the datadir, where InnoDB writes its transational log files
  • innodb_data_home_dir, default is the datadir, used a default value for the InnoDB files
  • innodb_data_file_path, default is in the datadir, path of the system tablespace
  • innodb_buffer_pool_filename, default is in the datadir, where InnoDB writes the buffer pool dump information
  • lc_messages_dir, basedir/share/mysql
  • log_bin_basename, default is the datadir, where the binlogs are stored
  • log_bin_index, default is the datadir, where the binlog index file is stored
  • log_error, no default value, where the MySQL error log is stored
  • pid-file, no default value, where the MySQL pid file is stored
  • plugin_dir, default is basedir/lib/mysql/plugin, where the MySQL plugins are stored
  • relay_log_basename, default is the datadir, where the relay logs are stored
  • relay_log_info_file, default is the datadir, may include a path
  • slave_load_tmpdir, default is tmpdir, where the slave stores files coming from LOAD DATA INTO statements.
  • slow_query_log, default is in the datadir, where the slow queries are logged
  • socket, no defaults, where the Unix socket file is created
  • ssl_*, SSL/TLS related files
  • tmpdir, default is /tmp, where temporary files are stored
  • wsrep_data_home_dir, default is the datadir, where galera stores its files
  • wsrep_provider->base_dir, default is wsrep_data_home_dir
  • wsrep_provider->gcache_dir, default is wsrep_data_home_dir, where the gcache file is stored
  • wsrep_provider->socket.ssl_*, no defaults, where the SSL/TLS related files for the Galera protocol are stored

That’s quite a long list and I may have missed some. If for any of these variables you use a non-standard path, you’ll need to adjust the context labels as we just did above.

All together

I would understand if you feel a bit lost, I am not a SELinux guru and it took me some time to understand decently how it works. Let’s recap how we can enable SELinux for PXC from what we learned in the previous sections.

1. Install the SELinux utilities

2. Allow the TCP ports used by PXC

3. Modify the SST script

Replace the wait_for_listen function in the /usr/bin/wsrep_sst_xtrabackup-v2 file by the version above. Hopefully, the next PXC release will include a SELinux friendly wait_for_listen function.

4. Set the security context labels for the configuration files

These steps seems optional but for completeness:

5. Create the policy file PXC.te

Create the file PXC.te with this content:

6. Compile and load the policy module

7. Run for a while in Permissive mode

Set SELinux into permissive mode in /etc/sysconfig/selinux and reboot. Validate everything works fine in Permissive mode, check the audit.log for any denied messages. If there are denied messages, address them.

8. Enforce SELINUX

Last step, enforce SELinux:

Conclusion

As we can see, enabling SELinux with PXC is not straightforward but, once the process is understood, it is not that hard either. In an IT world where security is more than ever a major concern, enabling SELinux with PXC is a nice step forward. In an upcoming post, we’ll look at the other security framework, Apparmor.

5 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
james

Thanks a lot.
How about standing alone server (Oracle or Maariadb or Percona)? Sometimes, I need to disable SELinux as well.
Thanks

vo

I find it a bit strange that Percona has a blog about these SELinux issues but Percona Server/XtraDB Cluster 5.7 has these issues not fixed.

There is PS-4813 that fixes one of the issues (for Percona Server 5.6 only!) but setpgid issue still remains.

Scott

Spectacular article and just the cure for all those instructions that simply say to turn SELinux off when using Galera replication. I feel after reviewing and understanding your process that I have now created a proper, least privilege SELinux policy augment for my deployment and not just “typed what the guy on the Internet did”. 🙂 I, too, had to add the setpgid and stream socket connectto rights, but after working to understand the SELinux project’s internal workings, I see that’s not in your control, rather someone would have to convince the SELinux base policy maintainers to add these rights for this particular package’s peculiarities and I know what an uphill battle those types of conversations can be when the person on the other end is trying to balance demands from the entire Linux ecosystem. Just wanted to let you know how much I appreciated this article.