Today I noticed one of server used for web request profiling stats logging is taking about 2GB per day for logs, which are written in MyISAM table without indexes. So I thought it is great to try how much archive storage engine could help me in this case.
Original Table:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | *************************** 1. row *************************** Name: requests_061111 Engine: MyISAM Version: 9 Row_format: Dynamic Rows: 8048913 Avg_row_length: 252 Data_length: 2030206088 Max_data_length: 4294967295 Index_length: 1024 Data_free: 0 Auto_increment: NULL Create_time: 2006-11-11 00:00:00 Update_time: 2006-11-12 00:01:45 Check_time: NULL Collation: utf8_general_ci Checksum: NULL Create_options: Comment: 1 row in set (0.62 sec) |
Lets see what table compression gives us:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | *************************** 1. row *************************** Name: requests_061111 Engine: ARCHIVE Version: 9 Row_format: Dynamic Rows: 8048913 Avg_row_length: 0 Data_length: 0 Max_data_length: NULL Index_length: 0 Data_free: 0 Auto_increment: NULL Create_time: NULL Update_time: NULL Check_time: NULL Collation: utf8_general_ci Checksum: NULL Create_options: Comment: 1 row in set (0.00 sec) |
Surprise. It would not tell you how much space table takes. Quite unfortunate especially as it should not be hard to do – There are no indexes and information about size on disk and compression buffer stage in memory should be readily available. So we have to go and check size of on disk file:
-rw-rw—- 1 mysql mysql 19 Nov 12 11:31 requests_061111.ARM
-rw-rw—- 1 mysql mysql 984628803 Nov 12 11:29 requests_061111.ARZ
-rw-rw—- 1 mysql mysql 9162 Nov 12 11:00 requests_061111.frm
Can’t say I’m impressed we have only about 2 times compression rate for this table which is heavily redundant (urls, referers, user agents etc)
To check if this is fair compression rate I also run gzip requests_061111.MYD and I got file which as about 520MB in size – almost half the size of archive table. I guess this is due to incremental compression which Archive engine uses, still it should have done much better. Might be configuration option should be added to allow compression buffer to be larger and so compression more efficient. I also run OPTIMIZE table and size stayed the same – not a big surprise as I got this table by converting MyISAM table rather than incremental insertions.
I also compared it to compressed MyISAM – this would take 1250MB so Archive does compress a bit better than myisampack.
I also checked how much table would take if compressed with bzip2 – it took a while to compress but result was 310MB 40% better than compressed by gzip. I wish Archive storage engine would specify archiving method and compression level. Some people may like fast compression while others would prefer to have smallest size possible.
Now regarding full table scan speed – in my case running simple full table scan query on the table changed from 5 min to about 4 minutes, less than table size reduction which is surprising. May be this is because server is loaded and having concurrent IO which randomizes otherwise sequential IO. It would be interesting to know which buffer size Archive storage engine using for reads – I could not find it as difference could be due to larger read_buffer_size used by MyISAM.
Summary: Archive storage engine does compresses your data and should be good for log storage or storing something which needs no updates and no indexes, however compression ratio is not perfect, configuration and stats could be improved.
Hi Peter!
What version of the server are you using? More information should be available in 5.0 for Archive then what is produced above.
Cheers,
-Brian
Brian,
This is 4.1.21
Good it was fixed in 5.0. Any changes regarding compression ratio ?
I suppose it’s compressing the records independently; hence not-so-good compression ratio.
Andrew,
It does not compress row by row this would make it unusable for small row sizes. It does use 32K buffer as Brian tells me , which is kind of small to get good compression ratio.
Nice stuff.
6 years later, any better?
I guess the compression ratio isnt good because it needs to be selected as quick as possible (fast compression).
How bout to compress the *.myd as small as possible but then voila, need to do select?