PostgreSQL extensions are great! Simply by adding an extension, one transforms what is an otherwise vanilla general-purpose database management system into one capable of processing data requirements in a highly optimized fashion. Some extensions, like pg_repack, simplify and enhance existing features already, while other extensions, such as PostGIS and pgvector, add completely new capabilities.
I’d like to introduce to you the PostgreSQL extension TimescaleDB. As the name implies, TimescaleDB is all about data recorded against time. In database systems, time series data is often a relatively small data point of one or more dimensions. A good example is the gathering of weather data, i.e., time versus air pressure, UV index, wind speed, temperature, humidity, etc.
Time series data processing can be taxing to any database system:
- Data INSERTs are coming in at a furious rate.
- Data retention requirements can be in the hundreds of terabytes, even petabytes.
- Data analysis and report generation capabilities of both archived data and data that’s coming in real-time are often required.
While PostgreSQL is quite capable of ingesting, managing, and analyzing time series data, there are other solutions, many of them proprietary, that can perform data ingestion and generate actionable insights at a faster rate.
Which brings us back to TimescaleDB.
TimescaleDB was created as a means to improve the rate of data ingestion of time series data into PostgreSQL by simplifying and automating many of the steps that an experienced DBA uses in order to optimize and maintain the health of the database over its lifecycle.
Installing and enabling the TimescaleDB extension
The best way to appreciate the possibilities is, of course, to install the extension and try it out!
Step one: Create the PostgreSQL file repository configuration
Because my Linux machine is Ubuntu, the following commands are used to install the extension:
1 2 3 | echo "deb https://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add - |
Step two: Get the extension
1 2 3 4 5 6 | echo "deb https://packagecloud.io/timescale/timescaledb/ubuntu/ $(lsb_release -c) -s) main" | sudo tee /etc/apt/sources.list.d/timescaledb.list wget --quiet -O - https://packagecloud.io/timescale/timescaledb/gpgkey | sudo apt-key add - apt update && apt upgrade -y |
As I’m using Ubuntu “Jammy,” I have the following available packages:
1 2 | # this command shows returns the package list apt search postgresql | grep -A 3 -E '^timescaledb-2-postgresql' |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | # A nice collection of packages covering all currently supported versions of postgres timescaledb-2-postgresql-12/jammy 2.11.2~ubuntu22.04 amd64 An open-source time-series database based on PostgreSQL, as an extension. timescaledb-2-postgresql-13/jammy 2.13.1~ubuntu22.04 amd64 An open-source time-series database based on PostgreSQL, as an extension. timescaledb-2-postgresql-14/jammy 2.13.1~ubuntu22.04 amd64 An open-source time-series database based on PostgreSQL, as an extension. timescaledb-2-postgresql-15/jammy 2.13.1~ubuntu22.04 amd64 An open-source time-series database based on PostgreSQL, as an extension. timescaledb-2-postgresql-16/jammy 2.13.1~ubuntu22.04 amd64 An open-source time-series database based on PostgreSQL, as an extension. |
Step three: Install TimescaleDB packages
1 | apt install timescaledb-2-postgresql-16 |
The available packages include:
1 2 3 4 5 6 7 8 9 10 11 | timescaledb-2-loader-postgresql-16 # The loader for TimescaleDB # timescaledb-2-postgresql-16 # The timescale extension for postgres 16 # timescaledb-toolkit-postgresql-16 # A library of analytical hyperfunctions, # time-series pipelining, and other SQL utilities # timescaledb-tools # A suite of tools that can be used with # TimescaleDB # PGBIN/timescaledb-parallel-copy # PGBIN/timescaledb-tune |
Step four: Tune the data-cluster
The CLI utility timescaledb-tune is great as it can be used to generate an optimized set of Postgres runtime parameters. However, it assumes that Postgres is the only major process running on the host and is neither competing for RAM nor CPU.
Suggested test invocation:
1 | /usr/bin/timescaledb-tune --help 2>& 1>/dev/stdout | less |
And here’s the invocation saving the results in postgresql.auto.conf:
1 2 3 4 5 | /usr/bin/timescaledb-tune --conf-path /var/lib/postgresql/16/main/postgresql.auto.conf -pg-version 16 -quiet -yes |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | shared_preload_libraries = 'timescaledb' shared_buffers = 5967MB effective_cache_size = 17901MB maintenance_work_mem = 2047MB work_mem = 5091kB timescaledb.max_background_workers = 16 max_worker_processes = 31 max_parallel_workers_per_gather = 6 max_parallel_workers = 12 wal_buffers = 16MB min_wal_size = 512MB max_wal_size = 1GB default_statistics_target = 100 random_page_cost = 1.1 checkpoint_completion_target = 0.9 max_connections = 100 max_locks_per_transaction = 256 autovacuum_max_workers = 10 autovacuum_naptime = 10 effective_io_concurrency = 256 timescaledb.last_tuned = '2024-03-01T20:49:53Z' timescaledb.last_tuned_version = '0.15.0' |
The service can now be restarted:
1 | systemctl restart postgresql@16-main |
1 2 3 4 5 6 7 | $ netstat -tlnp Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 127.0.0.53:53 0.0.0.0:* LISTEN 6786/systemd-resolv tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 6579/sshd: /usr/sbi tcp 0 0 127.0.0.1:5432 0.0.0.0:* LISTEN 11447/postgres tcp6 0 0 :::22 :::* LISTEN 6579/sshd: /usr/sbi |
Step five: Create database and extension
Pretty standard stuff, create your database and extension:
1 2 3 4 5 6 7 | createdb db01 psql db01<<_eof_ create extension timescaledb; -- optional; turn off telemetry alter system set timescaledb.telemetry_level=off; _eof_ |
Once created, the TimescaleDB extension installs a number of functions, views, and tables in the database:
1 2 3 4 5 6 7 8 9 10 11 12 | dn List of schemas Name | Owner --------------------------+------------------- _timescaledb_cache | postgres _timescaledb_catalog | postgres _timescaledb_config | postgres _timescaledb_functions | postgres _timescaledb_internal | postgres public | pg_database_owner timescaledb_experimental | postgres timescaledb_information | postgres |
Working with Timescale
Now that everything has been set up, we’re ready to explore.
Scenario one
Creating the tables
Let’s create two sets of tables: one is the standard heap table found in PostgreSQL, while the second type will, of course, be TimescaleDB’s hypertable.
Creating a TimescaleDB hypertable automatically defines the partitioning rule using any column of Date/Time Types found in the table. In this case, it’s column t_stamp which is of datatype timestamptz (timestamp with time zone) as the key.
1 2 3 4 5 6 7 8 9 10 11 | -- create a standard postgres table create table t_timescale( id uuid ,c2 int default random()*1E6 ,c3 int default random()*1E6 ,c4 int default random()*1E6 ,c5 int default random()*1E6 ,c6 int default random()*1E6 ,c7 int default random()*1E6 ,t_stamp timestamptz not null default clock_timestamp() ); |
This is the command required to create the hypertable:
1 | select create_hypertable('t_timescale', 't_stamp'); |
The ordinary heap table is created thusly:
1 | create table t_standard (like t_timescale); |
Increasing/decreasing chunk size
TimescaleDB partitions its tables into chunks. Although the default size is constrained to seven days, it can however be varied to any time range desired.
For the purposes of this blog, and because I’m using small data sets, the chunk interval for table t_timescale is set at 10 minutes:
1 | select set_chunk_time_interval('t_timescale', interval '10 minutes'); |
Chunk size best practices:
- Never make the chunk larger than the PostgreSQL shared buffer and free RAM.
- When in doubt, always start with chunks known to be smaller than the shared buffer. The reasoning being is that it’s easier to manage and administer tables making smaller chunks bigger than making larger chunks smaller in order to fit the available RAM.
Populating the tables
Prior to populating the tables, the psql meta-command timing is invoked:
1 | timing |
While the standard table didn’t have any index at table creation, thus accelerating data population, the hypertable added the index at the time of the hyper table’s creation. Please note: these numbers will vary wildly according to the Hardware used, i.e., mine is relatively low-end.
1 2 3 4 5 6 | -- HYPER -- -- 44:14 minutes: 54GB insert into t_timescale(id) select gen_random_uuid() from generate_series (1, 5E8); -- determining table size select * from pg_size_pretty(hypertable_size('t_timescale')); |
1 2 3 4 5 6 | -- HEAP -- -- 08:58 minutes : 36 GB insert into t_standard(id) select * from t_timescale -- 04:18 minutes: 10 GB create index on t_standard(t_stamp); |
Taking a closer look at the t_timescale schema definition one sees that five partitioned child tables have been created. Recalling the chunk interval setting of 10 minutes prior to the data population, new partitions were created as the time interval incremented by 10 minutes:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | Table Structure: t_timescale Column | Type | ---------+--------------------------+ id | uuid | c2 | integer | c3 | integer | c4 | integer | c5 | integer | c6 | integer | c7 | integer | t_stamp | timestamp with time zone | Indexes: "t_timescale_t_stamp_idx" btree (t_stamp DESC) Triggers: ts_insert_blocker BEFORE INSERT ON t_timescale FOR EACH ROW EXECUTE FUNCTION _timescaledb_functions.insert_blocker() Child tables: _timescaledb_internal._hyper_2_2_chunk, _timescaledb_internal._hyper_2_2_chunk, _timescaledb_internal._hyper_2_3_chunk, _timescaledb_internal._hyper_2_5_chunk, _timescaledb_internal._hyper_2_6_chunk Access method: heap |
Examination of one of the chunks confirms the partition is set at 10 minutes:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | db01=# d _hyper_2_6_chunk Table "_timescaledb_internal._hyper_2_6_chunk" Column | Type | Collation | Nullable | Default ---------+--------------------------+-----------+----------+--------------------------------------------------- id | uuid | | | c2 | integer | | | (random() * '1000000'::numeric::double precision) c3 | integer | | | (random() * '1000000'::numeric::double precision) c4 | integer | | | (random() * '1000000'::numeric::double precision) c5 | integer | | | (random() * '1000000'::numeric::double precision) c6 | integer | | | (random() * '1000000'::numeric::double precision) c7 | integer | | | (random() * '1000000'::numeric::double precision) t_stamp | timestamp with time zone | | not null | clock_timestamp() Indexes: "_hyper_2_6_chunk_t_timescale_t_stamp_idx" btree (t_stamp DESC) Check constraints: "constraint_6" CHECK (t_stamp >= '2024-01-30 19:20:00+00'::timestamp with time zone AND t_stamp < '2024-01-30 19:30:00+00'::timestamp with time zone) Inherits: public.t_timescale |
The inconsistent sizes, as indicated in the results below, are because the number of records varied within the assigned 10-minute interval:
1 2 3 4 5 6 7 8 9 10 11 | List of Tables Schema | Name | Size -----------------------+------------------------+-------------- public | t_standard | 36000 MB | TOTAL 36 GB -----------------------+------------------------+-------------- _timescaledb_internal | _hyper_2_2_chunk | 4292 MB | _timescaledb_internal | _hyper_2_3_chunk | 8540 MB | _timescaledb_internal | _hyper_2_4_chunk | 8712 MB | _timescaledb_internal | _hyper_2_5_chunk | 8204 MB | _timescaledb_internal | _hyper_2_6_chunk | 6769 MB | TOTAL 36.5 GB |
You’ll notice the timescaleDB Indexes are significantly larger than the Btree index created for table t_standard. Evidently, they contain more information/data:
1 2 3 4 5 6 7 8 9 10 11 | List of Indexes Schema | Name | Type | Size -----------------------+--------------------------------------------+-------+------------------------ public | _t_standard_t_stamp_idx | index | 10 GB TOTAL 10 GB ----------------------------------------------------------------------------------------------------- _timescaledb_internal | _hyper_2_2_chunk_t_timescale_t_stamp_idx | _hyper_2_2_chunk | 2262 MB _timescaledb_internal | _hyper_2_3_chunk_t_timescale_t_stamp_idx | _hyper_2_3_chunk | 4500 MB _timescaledb_internal | _hyper_2_4_chunk_t_timescale_t_stamp_idx | _hyper_2_4_chunk | 4591 MB _timescaledb_internal | _hyper_2_5_chunk_t_timescale_t_stamp_idx | _hyper_2_5_chunk | 4323 MB _timescaledb_internal | _hyper_2_6_chunk_t_timescale_t_stamp_idx | _hyper_2_6_chunk | 3567 MB TOTAL 19.2 GB |
Administering hypertable chunks
The following focuses solely on simple chunk administration.
Chunk, general purpose function calls
These TimescaleDB functions are used to administer the chunks:
1 2 3 4 5 6 7 8 9 | Schema | Name --------+------------------------- public | chunks_detailed_size public | drop_chunks public | move_chunk public | reorder_chunk public | set_adaptive_chunking public | set_chunk_time_interval public | show_chunks |
Chunk, compression function calls
These TimescaleDB functions are used to administer table’s compression:
1 2 3 4 5 6 7 8 9 | Schema | Name --------+------------------------------ public | add_compression_policy public | chunk_compression_stats public | compress_chunk public | decompress_chunk public | hypertable_compression_stats public | recompress_chunk public | remove_compression_policy |
Timescale chunk runtime parameters
Similar to the general collection of Postgres runtime parameters, some of these timescale parameters operate across the data cluster while others can be more fine-tuned for specific relations and even sessions.
1 2 3 4 5 6 7 8 9 | select name, setting from pg_settings where name ~ '^timescale' and name ~ 'compress' union select name, setting from pg_settings where name ~ '^timescale' and name ~ 'chunk'; |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | name | setting -------------------------------------------------+--------- timescaledb.enable_bulk_decompression | on timescaledb.enable_chunk_append | on timescaledb.enable_chunkwise_aggregation | on timescaledb.enable_compression_indexscan | on timescaledb.enable_decompression_logrep_markers | off timescaledb.enable_decompression_sorted_merge | on timescaledb.enable_dml_decompression | on timescaledb.enable_hypertable_compression | on timescaledb.enable_parallel_chunk_append | on timescaledb.enable_transparent_decompression | on timescaledb.max_cached_chunks_per_hypertable | 1024 timescaledb.max_open_chunks_per_insert | 1024 |
Scenario two
Taking a small subset of the aforementioned functions and runtime parameters, this next scenario demonstrates how one can compress everything from individual chunks to setting a comprehensive policy for a table based on the chunk age that has been created under normal production conditions.
Compressing a chunk
Table compression is one of those features best labeled under the category of the killer feature. What’s especially cool is one can not only reduce space consumption but query and perform DML operations on a hyper table too.
1 2 3 4 5 6 7 8 9 10 | -- Get Current State of CHUNKS db01=# dt+ _timescaledb_internal._hyper*chunk List of relations Schema | Name | Type | Owner | Persistence | Access method | Size -----------------------+------------------+-------+----------+-------------+---------------+---------+ _timescaledb_internal | _hyper_2_2_chunk | table | postgres | permanent | heap | 4292 MB | _timescaledb_internal | _hyper_2_3_chunk | table | postgres | permanent | heap | 8540 MB | _timescaledb_internal | _hyper_2_4_chunk | table | postgres | permanent | heap | 8712 MB | _timescaledb_internal | _hyper_2_5_chunk | table | postgres | permanent | heap | 8204 MB | _timescaledb_internal | _hyper_2_6_chunk | table | postgres | permanent | heap | 6769 MB | |
1 2 3 4 5 6 | -- ENABLE COMPRESSION alter table t_timescale set ( timescaledb.compress, timescaledb.compress_orderby = 't_stamp asc' ); |
1 2 | -- Time: 163081.593 ms (02:43.082) select * from compress_chunk('_timescaledb_internal._hyper_2_4_chunk',true); |
Notice how the chunk’s size has been zeroed. In order to get the true size of the now compressed relation, you’ll need to use a function call, see below for an example invocation.
1 2 3 4 5 6 7 8 | db01=# dt+ _timescaledb_internal._hyper*chunk Schema | Name | Type | Owner | Persistence | Access method | Size -----------------------+------------------+-------+----------+-------------+---------------+---------- _timescaledb_internal | _hyper_2_2_chunk | table | postgres | permanent | heap | 4292 MB | _timescaledb_internal | _hyper_2_3_chunk | table | postgres | permanent | heap | 8540 MB | _timescaledb_internal | _hyper_2_4_chunk | table | postgres | permanent | heap | 0 bytes | _timescaledb_internal | _hyper_2_5_chunk | table | postgres | permanent | heap | 8204 MB | _timescaledb_internal | _hyper_2_6_chunk | table | postgres | permanent | heap | 6769 MB | |
This function gives a complete set of metrics describing the now compressed chunk:
1 2 | -- Get Current State of Compression select * from hypertable_compression_stats('t_timescale'); |
1 2 3 4 5 6 7 8 9 10 11 12 13 | db01=# select * from hypertable_compression_stats('t_timescale'); -[ RECORD 1 ]------------------+------------ total_chunks | 5 number_compressed_chunks | 1 before_compression_table_bytes | 9135104000 before_compression_index_bytes | 4813725696 before_compression_toast_bytes | 0 before_compression_total_bytes | 13948829696 after_compression_table_bytes | 24150016 after_compression_index_bytes | 0 after_compression_toast_bytes | 4946141184 after_compression_total_bytes | 4970291200 node_name |
Decompressing a chunk
It’s just as easy to decompress a chunk as it is to compress one. The typical reason will be to maximize performance when it comes to the various DML operations that must be performed.
1 2 3 | -- Decompressing Chunk Size -- Time: 267639.222 ms (04:27.639) select * from decompress_chunk('_timescaledb_internal._hyper_2_4_chunk'); |
Setting a chunk compression policy
As previously demonstrated, setting the CHUNK interval (time) is a fairly straightforward process. So too, is setting the compression policy. One merely chooses the hypertable and how long after it has been created before it is finally compressed. Having a time-based argument for compression recognizes the need to process the most recent data in the shortest amount of time:
1 2 3 4 5 6 | -- RECALL: ENABLE COMPRESSION alter table t_timescale set ( timescaledb.compress, timescaledb.compress_orderby = 't_stamp asc' ); |
This compression policy compresses chunks that are older than 30 minutes:
1 2 3 4 5 | select add_compression_policy('t_stamp', compress_after => INTERVAL '30m'); add_compression_policy ------------------------ 1000 |
Testing the newly set compression policy is accomplished by inserting new records:
1 2 3 4 5 6 7 8 9 | -- Populate new records insert into t_timescale(id) select gen_random_uuid() from generate_series (1, 2E8); INSERT 0 200000000 Time: 1100742.301 ms (18:20.742) select * from pg_size_pretty(hypertable_size('t_timescale')); pg_size_pretty ---------------- 41 GB |
Note the partition size differences between the old and new ones:
1 2 3 4 5 6 7 8 9 10 11 12 | dt+ _timescaledb_internal._hyper*chunk List of relations Schema | Name | Type | Owner | Persistence | Access method | Size ----------------------+-------------------+-------+----------+-------------+---------------+-------- _timescaledb_internal | _hyper_2_12_chunk | table | postgres | permanent | heap | 3231 MB _timescaledb_internal | _hyper_2_14_chunk | table | postgres | permanent | heap | 8265 MB _timescaledb_internal | _hyper_2_15_chunk | table | postgres | permanent | heap | 3111 MB _timescaledb_internal | _hyper_2_2_chunk | table | postgres | permanent | heap | 0 bytes _timescaledb_internal | _hyper_2_3_chunk | table | postgres | permanent | heap | 0 bytes _timescaledb_internal | _hyper_2_4_chunk | table | postgres | permanent | heap | 0 bytes _timescaledb_internal | _hyper_2_5_chunk | table | postgres | permanent | heap | 0 bytes _timescaledb_internal | _hyper_2_6_chunk | table | postgres | permanent | heap | 0 bytes |
1 2 3 4 5 6 7 8 9 10 11 12 13 | db01=# select * from hypertable_compression_stats('t_timescale'); -[ RECORD 1 ]------------------+------------ total_chunks | 8 number_compressed_chunks | 5 before_compression_table_bytes | 38291226624 before_compression_index_bytes | 20177428480 before_compression_toast_bytes | 0 before_compression_total_bytes | 58468655104 after_compression_table_bytes | 101351424 after_compression_index_bytes | 0 after_compression_toast_bytes | 20732502016 after_compression_total_bytes | 20833853440 node_name |
Disabling the compression policy is accomplished thusly:
1 | SELECT remove_compression_policy('t_timescale'); |
Caveat
As with all technologies, especially new ones, there’s always some form of limitation. TimescaleDB is no exception:
- Foreign key constraints referencing a hypertable are not supported.
- Time dimensions (columns) used for partitioning cannot have NULL values.
- UNIQUE indexes must include all columns that are partitioning dimensions.
- UPDATE statements that move values between partitions (chunks) are not supported.
- Horizontal scaling, and multi-node support, are no longer supported. Instead, it is recommended that a distributed network file system, such as Ceph, be used for scaling purposes.
References
https://docs.timescale.com/
https://docs.timescale.com/api/latest/
https://docs.timescale.com/api/latest/compression/alter_table_compression/
https://docs.timescale.com/use-timescale/latest/compression/about-compression/