Greetings everyone! Let’s discuss a 3rd Party auditing solution to MySQL and how we can leverage MongoDB® to make sense out of all of that data.

The McAfee MySQL Audit plugin does a great job of capturing, at low level, activities within a MySQL server. It does this through some non-standard APIs which is why installing and configuring the plugin can be a bit difficult. The audit information is stored in JSON format, in a text file, by default.

There is 1 JSON object for each action that takes place within MySQL. If a user logs in, there’s an object. If that user queries a table, there’s an object. Imagine 1000 active connections from an application, each doing 2 queries per second. That’s 2000 JSON objects per second being written to the audit log. After 24 hours, that would be almost 173,000,000 audit entries!

How does one make sense of that many JSON objects? One option would be to write your own parser in $YOUR_FAVORITE_LANGUAGE and convert the JSON to INSERT statements and write the data back to MySQL (Note: If you do this, you can whitelist this table within the plugin so that these INSERTs are not re-audit logged). Or, we can use a system designed to import, store and query JSON objects, such as MongoDB.

Install McAfee Audit Plugin

First we need to download the source code for the plugin and download the source code for the specific MySQL version you are running. This is not a complete step-by-step HOWTO on installing this plugin; just some high-level points.

My client for this exercise is still on Percona Server 5.1.73, so we need the source for that EXACT version from percona.com.

We can clone the mcafee/mysql-audit using git.

Unzip the MySQL source and compile it; just don’t do “make install”, only “./configure” and “make” are necessary.

Now compile the plugin. You may want to read the detailed instructions.

This next step is tricky and really only necessary if you are not using vanilla MySQL. It is a required step to allow the plugin to use those non-standard API’s I mentioned earlier. You need to extract the offsets for the plugin to work. Follow the instructions given.

Once that is all done, you can:

If the plugin fails to load, check MySQL’s error log for the reason why and confer with the plugin documentation on how to resolve.

We now need to enable audit logging because nothing is enabled by default.

Look inside @@datadir and you should see a file called mysql-audit.json. You can tail -f this file if you’d like to watch it to make sure data is being written.

If you’d like some more background reading on the audit plugin, check out Fernando’s post on Experiences with McAfee Audit Plugin.

Setting Up MongoDB

Let me begin by stating this is my first time really dealing with MongoDB in any real sense. I spun up an EC2 instance in AWS (m3.large, CentOS 6) and installed MongoDB using yum and the Mongo repositories.

As the ephemeral storage for my instance had been mounted at /opt, I changed just this one option in the supplied /etc/mongod.conf and restarted mongo (service mongod restart).

I then copied the mysql-audit.json from the MySQL host using SSH:

Then I imported this JSON file directly into MongoDB:

The above mongoimport command specifies the database in which to import (test) and in which collection (audit). I also specify to –drop the database before importing. This drop is necessary because the Audit Plugin appends to JSON file and if we repeated these import steps without the –drop, we would be duplicating data.

If there is enough interest, via the comments below, I will investigate the potential of using the socket functionality of the Audit Plugin to have the events stream directly into mongo.

For now though, it’s a wash-rinse-repeat cycle; though there is the ability to rotate the JSON audit log after a certain amount of time and import each file on a daily basis.

Making Data Make Sense

Here is a sample “document” (ie: audit event) that is created by the Audit Plugin.

!! MongoDB BUG !!

Notice that last field in the document is named “query.” When I attempted some basic aggregate() functions on this field, I received errors on bad syntax. After much frustration, lots Googling and repeated testing, I came to the only conclusion that “query” is a reserved word in MongoDB. There is little to no documentation on this, aside from an almost 3 year old bug report that simply helped confirm my suspicion.

To work around the above bug issue, let’s rename all of the “query” fields to “qry”:

Now we can begin.

Basic Command Counters

Using any of the “top level” fields in each document, we can run reports (called aggregates in Mongo). So an easy one is to get a list of all unique “commands” and how many times they occurred.

Breaking down the command above, we are grouping all values in the “cmd” field and counting them up. The SQL equivalent would be:

User Counts

Let’s get a list and count of all user activities. This will include any of the commands listed in the previous aggregate.

A couple interesting things come out here. Firstly, the tech_shrd user does the most ‘activities’ over all other users. Is this expected? Is this normal? Your environment will determine that.

Specific User Activities

Let’s pick a specific user and get their activity counts to make sure they aren’t doing anything weird.

The SQL equivalent:

Activities By User

We saw above that there were 2 insert commands. Who ran those?

More simply, we could have just done this to see the entire document/record which would include the SQL that the user executed, timestamp, hostname, etc.

The SQL equivalents:

Table Activity

The most complex example I could come up with was trying to find out how many times each table was referenced. In theory, with weeks or even months of audit data, we could decide which tables aren’t needed any longer by the application.

You’ll notice in the sample document above that “objects” is an array of objects with 1 element for each table/view referenced in the ‘qry’ field. We need to “unwind” this array into single elements before we can count them. If someone knows a better way, please let me know. The Audit Plugin uses “*” to represent a derived table from a sub-SELECT, which has no proper name. We can remove all of these using:

Audit Plugin Caveat: The ‘objects’ array is not a distinct list of the tables involved. For example, a SELECT statement that self-joins twice would produce 3 identical elements in the ‘objects’ array for that audit record. This may skew results. If anyone knows a cool Mongo trick to remove duplicates, please share in the comments.

Conclusion

For a quick wrap-up, we installed the McAfee Audit Plugin, exported some audit data, set up a MongoDB instance in AWS and imported the audit data. As you can see, the possibilities are plentiful on what kind of information you can gather. Feel free to comment on an aggregation you’d like to see if we were running this type of audit on your system.

Cheers,
Matthew

4 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Shahriyar Rzayev

Thanks for tutorial.

Did you encounter any SElinux issues with McAfee plugin while testing?

Guy LIchtman

Hi Matthew,

Nice post.

I am the main maintainer of the McAfee MySQL Audit Plugin and it is great to see how you did the integration with MongoDB. Your note about duplicate objects and how it effects reporting in MongoDB is something I never thought about. I think filtering the duplicate objects is something that can actually be done on the plugin side and I hope we can put in a simple fix for this soon.

If you have any suggestions for improvements please feel free to open an issue on the github project: https://github.com/mcafee/mysql-audit/

James Spurt

Best tutorial I have ever come across. Thanks for sharing it!!

James Spurt

One of the best tutorials for auditing mysql with McAfee. It really works!! Thanks!!