Trying Archive Storage Engine

Today I noticed one of server used for web request profiling stats logging is taking about 2GB per day for logs, which are written in MyISAM table without indexes. So I thought it is great to try how much archive storage engine could help me in this case.

Original Table:

*************************** 1. row ***************************
           Name: requests_061111
         Engine: MyISAM
        Version: 9
     Row_format: Dynamic
           Rows: 8048913
 Avg_row_length: 252
    Data_length: 2030206088
Max_data_length: 4294967295
   Index_length: 1024
      Data_free: 0
 Auto_increment: NULL
    Create_time: 2006-11-11 00:00:00
    Update_time: 2006-11-12 00:01:45
     Check_time: NULL
      Collation: utf8_general_ci
       Checksum: NULL
 Create_options:
        Comment:
1 row in set (0.62 sec)

*************************** 1. row ***************************

Name: requests_061111

Engine: MyISAM

Version: 9

Row_format: Dynamic

Rows: 8048913

Avg_row_length: 252

Data_length: 2030206088

Max_data_length: 4294967295

Index_length: 1024

Data_free: 0

Auto_increment: NULL

Create_time: 2006-11-11 00:00:00

Update_time: 2006-11-12 00:01:45

Check_time: NULL

Collation: utf8_general_ci

Checksum: NULL

Create_options:

Comment:

1 row in set (0.62 sec)

Lets see what table compression gives us:

*************************** 1. row ***************************
           Name: requests_061111
         Engine: ARCHIVE
        Version: 9
     Row_format: Dynamic
           Rows: 8048913
 Avg_row_length: 0
    Data_length: 0
Max_data_length: NULL
   Index_length: 0
      Data_free: 0
 Auto_increment: NULL
    Create_time: NULL
    Update_time: NULL
     Check_time: NULL
      Collation: utf8_general_ci
       Checksum: NULL
 Create_options:
        Comment:
1 row in set (0.00 sec)

*************************** 1. row ***************************

Name: requests_061111

Engine: ARCHIVE

Version: 9

Row_format: Dynamic

Rows: 8048913

Avg_row_length: 0

Data_length: 0

Max_data_length: NULL

Index_length: 0

Data_free: 0

Auto_increment: NULL

Create_time: NULL

Update_time: NULL

Check_time: NULL

Collation: utf8_general_ci

Checksum: NULL

Create_options:

Comment:

1 row in set (0.00 sec)

Surprise. It would not tell you how much space table takes. Quite unfortunate especially as it should not be hard to do – There are no indexes and information about size on disk and compression buffer stage in memory should be readily available. So we have to go and check size of on disk file:

-rw-rw—- 1 mysql mysql 19 Nov 12 11:31 requests_061111.ARM
-rw-rw—- 1 mysql mysql 984628803 Nov 12 11:29 requests_061111.ARZ
-rw-rw—- 1 mysql mysql 9162 Nov 12 11:00 requests_061111.frm

Can’t say I’m impressed we have only about 2 times compression rate for this table which is heavily redundant (urls, referers, user agents etc)

To check if this is fair compression rate I also run gzip requests_061111.MYD and I got file which as about 520MB in size – almost half the size of archive table. I guess this is due to incremental compression which Archive engine uses, still it should have done much better. Might be configuration option should be added to allow compression buffer to be larger and so compression more efficient. I also run OPTIMIZE table and size stayed the same – not a big surprise as I got this table by converting MyISAM table rather than incremental insertions.

I also compared it to compressed MyISAM – this would take 1250MB so Archive does compress a bit better than myisampack.

I also checked how much table would take if compressed with bzip2 – it took a while to compress but result was 310MB 40% better than compressed by gzip. I wish Archive storage engine would specify archiving method and compression level. Some people may like fast compression while others would prefer to have smallest size possible.

Now regarding full table scan speed – in my case running simple full table scan query on the table changed from 5 min to about 4 minutes, less than table size reduction which is surprising. May be this is because server is loaded and having concurrent IO which randomizes otherwise sequential IO. It would be interesting to know which buffer size Archive storage engine using for reads – I could not find it as difference could be due to larger read_buffer_size used by MyISAM.

Summary: Archive storage engine does compresses your data and should be good for log storage or storing something which needs no updates and no indexes, however compression ratio is not perfect, configuration and stats could be improved.

6 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Brian Aker

17 years ago

Hi Peter!

What version of the server are you using? More information should be available in 5.0 for Archive then what is produced above.

Cheers,
-Brian

Peter Zaitsev

Author

17 years ago

Brian,

This is 4.1.21
Good it was fixed in 5.0. Any changes regarding compression ratio ?

Andrew Aksyonoff

17 years ago

I suppose it’s compressing the records independently; hence not-so-good compression ratio.

Peter Zaitsev

Author

17 years ago

Andrew,

It does not compress row by row this would make it unusable for small row sizes. It does use 32K buffer as Brian tells me , which is kind of small to get good compression ratio.

AMCT

17 years ago

Nice stuff.

Indra

11 years ago

6 years later, any better?

I guess the compression ratio isnt good because it needs to be selected as quick as possible (fast compression).
How bout to compress the *.myd as small as possible but then voila, need to do select?

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Trying Archive Storage Engine

Related

Related Blog Articles

RECOMMENDED ARTICLES

High Availability: Choosing the Right Option for Your Percona Monitoring and Management

New Valkey Packages by Percona

Valkey/Redis: Not-So-Good Practices

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Trying Archive Storage Engine

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

High Availability: Choosing the Right Option for Your Percona Monitoring and Management

New Valkey Packages by Percona

Valkey/Redis: Not-So-Good Practices

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation