As a MySQL DBA/consultant, it is part of my job to decode the MySQL binary logs – and there are a number of reasons for doing that. In this post, I’ll explain how you can get the important information about your write workload using MySQL row-based binary logs and a simple awk script.

MySQL Row-Based Binary Logs

First, it is important to understand that row-based binary logs contain the actual changes done by a query. For example, if I run a delete query against a table, the binary log will contain the rows that were deleted. MySQL provides the mysqlbinlog utility to decode the events stored in MySQL binary logs. You can read more about mysqlbinlog in detail in the reference manual here.

The following example illustrates how mysqlbinlog displays row events that specify data modifications. These correspond to events with the WRITE_ROWS_EVENT, UPDATE_ROWS_EVENT, and DELETE_ROWS_EVENT type codes.

We will use following options of mysqlbinlog.
–base64-output=decode-rows
–verbose, -v
–start-datetime=”datetime”
–stop-datetime=”datetime”

We have a server running with row-based binary logging.

We created a test table and executed the following sequence of statements under a transaction.

Now let’s see how it is represented in binary logs.

  • Row with “Table_map: test.t” defines the table name used by query.
  • Rows with “Write_rows/Update_rows/Delete_rows” defines the event type.
  • Lines that start with “###” defines the actual row that got changed.
  • Columns are represented as @1, @2 and so on.

Now have a look at our simple awk script that will use the mysqlbinlog output and print a beautiful summary for INSERT/UPDATE/DELETE events from row-based binary logs. Just replace the “mysqld-bin.000023” with your binary log file. The string “#15” in third line is for year 2015. If you are decoding a binary log file from 2014, just change it to “#14”. It is also recommended to use “–start-datetime” and ” –stop-datetime” options to decode binary logs of a specific time range instead of decoding a large binary log file.

Script :

Output :

This awk script will return following columns as output for every transaction in binary logs.

Timestamp : timestamp of event as logged in binary log.
Table : database.tablename
Query Type : Type of query executed on table and number of rows affected by query.

[Transaction total : 3 Insert(s) : 1 Update(s) : 1 Delete(s) : 1]
Above line print the summary of transaction, it displays the total number of rows affected by transaction, total number of rows affected by each type of query in transaction.

Let’s execute some more queries in sakila database and then we will summarize them using our script. We assume that we executed these queries between “2015-01-16 13:30:00” and “2015-01-16 14:00:00”

We will save our script in a file named “summarize_binlogs.sh” and summarize the queries we just executed.

Now we run our script to get the summary of information logged in binary log.

See how nicely our script has summarized the information logged in binary log. For the last transaction, it shows 3 rows that corresponds to each statement executed within transaction and then the final summary of transaction i.e. transaction affected 4 rows in total and all of them were deleted.

This was how we can summarize MySQL row-based binary logs in a more readable format using a simple awk script. This will save you time whenever you need to decode the binary logs for troubleshooting. Now I’ll show you how this summarized information can help answer some basic but important questions.

Q1 : Which tables received highest number of insert/update/delete statements?

Q2 : Which table received the highest number of DELETE queries?

Q3: How many insert/update/delete queries executed against sakila.country table?

Q4: Give me the top 3 statements which affected the maximum number of rows.

See how easy it is to identify the large transactions.

Q5 : Find DELETE queries that affected more than 1000 rows.

If we want to get all queries that affected more than 1000 rows.

Conclusion

The information logged in binary logs is really helpful to understand our write workload. We used mysqlbinlog + awk to present the binary log contents in more readable format. Further we used the summarized information to answer some of the important questions.

13 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Aftab Khan

Nice post! I think the same information (workload) can be easily seen using performance schema:

mysql> select table_schema, table_name, rows_fetched,rows_inserted,rows_updated,
rows_deleted from x$schema_table_statistics where table_schema=”test”\G
*************************** 1. row ***************************
table_schema: test
table_name: tbl_partition
rows_fetched: 846933340
rows_inserted: 588251
rows_updated: 488251
rows_deleted: 100000
1 row in set (0.03 sec)

Shantanu Oak

If you are not using row based binary then something like this would work.

mysqlbinlog ${1} | grep -i -e “^update” -e “^insert” -e “^delete” -e “^replace” -e “^alter” | cut -c1-100 | tr ‘[A-Z]’ ‘[a-z]’ | sed -e “s/\t/ /g;s/`//g;s/(.*$//;s/ set .*$//;s/ as .*$//” | sed -e “s/ where .*$//” | sort | uniq -c | sort -nr

Sanju

Alok,

Suppose, I have a binlog in ROW format. I want to extract SQL statements for particular query. How can I do that?

Thanks
Sanju.

Sanju

I mean extract SQL statements of single table.

Alok Pathak

Hi Sanju,

When using row based binary logging, actual changes will be logged instead of query which made the change. If you’re using MySQL 5.6, you can enable “binlog_rows_query_log_events” variable to log the actual query (for informational purpose only) along with the row data. You can read more about this in the link : http://dev.mysql.com/doc/refman/5.6/en/replication-options-binary-log.html#sysvar_binlog_rows_query_log_events.

Regards,
Alok

carelessadmin

There is an error in “{if(match($0, /#15.*Table_map:.*mapped to number/))”
better use {if(match($0, /^#.*Table_map:.*mapped to number/))

Anurag Kale

I there any similar way of tracing the DDL changes made during a time range in MySQL?

ani

Is there a way to find information regarding which column actually changed in the table! I want to find the columns that are frequently updated

ani

I have also tried to create triggers on the replica but it is not triggered by binary log events I don’t know why

Mannoj

Guys this is useful, but anyone using this script for binlog year dated 2015 it will show timestamp, if not it won’t. Percona team request you to change this in line number 8 -> #15 . Remove this entry for it to work for any Year in binlog. Or change it as per carelessadmin.

Also people looking for this time level count.

1. line number 8 -> #15 (Remove entry #15)
2. sh summarize_binlogs.sh | grep -i Table | cut -d ‘:’ -f’2,3,4′ | sort | uniq -c

Jitendra

Hi , I have one doubt , I could see the Table_map in mysql binary log has some random behavior, 1st of all it is not matching with the table/-is defined in INFORMATION_SCHEMA.INNODB_SYS_TABLES .

Secondly when I restart my database the Table_map value in mysql binary log changes to a different value,

I could not correlate it. how and on what basis this table_map value is changing.

Any help on this is highly appreciable.

venkat

Hi Alok,
Mine is mixed replication in 5.7.24. Can you share a script that gives me the statistics about which table has more operations going on it inserts , updates and deletes . Would be really thankful

Venkat

ishant mahajan

Is there any way to have the timestamp till micro or milliseconds