Comments on: Introducing PostgreSQL 15: Working with SELECT DISTINCT https://www.percona.com/blog/introducing-postgresql-15-working-with-distinct/ Wed, 29 Mar 2023 18:29:23 +0000 hourly 1 https://wordpress.org/?v=6.5.2 By: rbernier01 https://www.percona.com/blog/introducing-postgresql-15-working-with-distinct/#comment-10973685 Tue, 26 Jul 2022 00:08:57 +0000 https://www.percona.com/blog/?p=81726#comment-10973685 Unfortunately, not too many people know about HyperLogLog even though it’s been around for years.

]]>
By: Fazal Majid https://www.percona.com/blog/introducing-postgresql-15-working-with-distinct/#comment-10973684 Mon, 25 Jul 2022 12:41:53 +0000 https://www.percona.com/blog/?p=81726#comment-10973684 It’s worth mentioning that at scale no one uses SELECT DISTINCT but instead fast approximate methods like HyperLogLog as implemented in CitusData’s postgresql-hll:

https://github.com/citusdata/postgresql-hll

]]>
By: Peter Gram https://www.percona.com/blog/introducing-postgresql-15-working-with-distinct/#comment-10973683 Fri, 22 Jul 2022 17:48:44 +0000 https://www.percona.com/blog/?p=81726#comment-10973683 you write “This SQL statement returns those records filtering out the UNIQUE values found in column “c1” in SORTED order: select distinct on(c1) * from t_ex;”.
The SQL standard is open to if distinct should return data sorted or random since this i implementation dependent. Oracle used to return values from “distinct” and “group by” sorted but from version 9.x this is not true longer since they started to use a hash parallel function and after this values are not sorted.
The official statement is if you need data in sorted order use a “order by”.

]]>
By: rbernier01 https://www.percona.com/blog/introducing-postgresql-15-working-with-distinct/#comment-10973665 Thu, 14 Jul 2022 15:25:30 +0000 https://www.percona.com/blog/?p=81726#comment-10973665 It’s a good point. I think that so long as there’s enough information allowing its replication then people can hopefully takie what they’ve read as a starting point for their own investigations.

]]>
By: rbernier01 https://www.percona.com/blog/introducing-postgresql-15-working-with-distinct/#comment-10973664 Thu, 14 Jul 2022 15:19:02 +0000 https://www.percona.com/blog/?p=81726#comment-10973664 Hi, It’s an excellent point!

I’m always 2nd guessing myself how much tuning I should perform whenever I do one of these investigations. Over time I’ve realized that so long as the methodology and assumptions are documented then there’s hopefully enough information presented that it can be used at least as a starting point satisfying the reader’s own needs.

Hope this helps 🙂

]]>
By: Oliver https://www.percona.com/blog/introducing-postgresql-15-working-with-distinct/#comment-10973663 Wed, 13 Jul 2022 14:50:21 +0000 https://www.percona.com/blog/?p=81726#comment-10973663 Good to see, thank you.

I see that the query is spilling to disk – from the query plan:

Sort Method: external merge Disk: 68432kB

It would be interesting to see the timing results if the machine is configured with enough memory and/or suitable work_mem etc. parameters so that that doesn’t happen, which would give a more representative use. Otherwise the disk activity could swamp pretty much every other factor.

]]>