The Daily WTF: Curious Perversions in Information Technology

Steve_The_Cynic · 2015-01-30 Reply Admin

It occurs to me that nobody has mentioned the WTF in the writing (not the WTF in the subject of the writing, but in the writing itself).

The article suggests strongly that Perl is a write-only language. This assertion is ridiculous. Anybody with an ounce of knowledge knows that while Perl isn't easy to read, its illegibility is a mere grain of sand in the path of knowledge when compared to the Himalayan heights of the illegibility of APL.

Of course, APL code is all Greek to me... (APL's name is also the the lamest TLA I've encountered so far.)

Jaloopa · 2015-01-30 Reply Admin

Steve_The_Cynic:
APL's name is also the the lamest TLA I've encountered so far

http://en.wikipedia.org/wiki/APL_%28programming_language%29

Really? Could they honestly not come up with anythinng better than that?

hhaamu · 2015-01-30 Reply Admin

TBH, that query sucks. You need to level-up in SQL a couple more times. It's written with the procedural-programming mindset, and the performance is suboptimal.

The problem?

SELECT ...
    WHERE user_id IN  (
        SELECT ...
        WHERE group_id IN(
            SELECT ...
            WHERE g.name IN ('admins')
        )
    )
),

You get my point? The SQL engine has to disentangle that query by executing each of the sub-selects in sequential order, innermost first. I don't think the row estimator works that great with that pattern either. (It may be more intelligent nowadays; I've only experience with Postgres 8.4)

You're supposed to write each of those as joins. If it's a 1:n relation, it might be tricky, but n:1 is trivial. Something like this:

SELECT p.user_id, p.id, p.topic_id, p.post_number
	FROM posts p
	JOIN groups_users gu ON p.user_id = gu.user_id
	JOIN groups g ON gu.group_id = g.id
	WHERE raw LIKE '%[UUID omitted for obvious reasons]%' AND
		g.name IN ('admins');

Of course, that's likely to give extra/duplicate-ish rows if we're dealing with 1:n relations, thanks to group theory. Whether that's a problem, I don't know. It depends on the code.

But anyway, the query looks a lot nicer to me when written as joins.

Filed under: Armchair SQL non-expert, I'm probably in the wrong thread for this anyway

Watson · 2015-01-30 Reply Admin

And then in computer studies, the teacher asks someone to evaluate 0.0×log(sin(0)).

PJH · 2015-01-30 Reply Admin

hhaamu:
TBH, that query sucks. You need to level-up in SQL a couple more times. It's written with the procedural-programming mindset, and the performance is suboptimal.
[suggested reworking of query]

Uh-huh. Let me know how that works out for you.

Meanwhile, with a real dataset, your suggested query runs slower than mine - not that the actual time taken is a problem since it's less than a second (and the entire query takes less than 2) for a query that's run at most once a minute...

Your version first, followed by mine:

discourse=# SELECT p.user_id, p.id, p.topic_id, p.post_number
discourse-# FROM posts p
discourse-# JOIN group_users gu ON p.user_id = gu.user_id
discourse-# JOIN groups g ON gu.group_id = g.id
discourse-# WHERE raw LIKE '%[uid removed]%' AND
discourse-# g.name IN ('admins');
 user_id |   id   | topic_id | post_number
---------+--------+----------+-------------
      20 | 131276 |     4292 |           1
      20 | 136772 |     2929 |          17
      20 | 203068 |     6979 |        1186
      20 | 131160 |     1673 |        3634
      20 | 131125 |     1673 |        3632
      20 | 131170 |     3125 |        4585
      20 | 131167 |     1000 |       27836
(7 rows)

Time: 442.626 ms
discourse=# SELECT user_id, id, topic_id, post_number
discourse-# FROM posts
discourse-# WHERE raw LIKE '%[uid removed]%' AND
discourse-# user_id IN  (
discourse(# SELECT gu.user_id
discourse(# FROM group_users gu
discourse(# WHERE group_id IN(
discourse(# SELECT g.id
discourse(# FROM groups g
discourse(# WHERE g.name IN ('admins')
discourse(# )
discourse(# );
 user_id |   id   | topic_id | post_number
---------+--------+----------+-------------
      20 | 131276 |     4292 |           1
      20 | 136772 |     2929 |          17
      20 | 203068 |     6979 |        1186
      20 | 131160 |     1673 |        3634
      20 | 131125 |     1673 |        3632
      20 | 131170 |     3125 |        4585
      20 | 131167 |     1000 |       27836
(7 rows)

Time: 427.125 ms

And a repeat of both in case you might think caching might be an issue:

discourse=# SELECT p.user_id, p.id, p.topic_id, p.post_number
FROM posts p
JOIN group_users gu ON p.user_id = gu.user_id
JOIN groups g ON gu.group_id = g.id
WHERE raw LIKE '%[uid removed]%' AND
g.name IN ('admins');
 user_id |   id   | topic_id | post_number
---------+--------+----------+-------------
      20 | 131276 |     4292 |           1
      20 | 136772 |     2929 |          17
      20 | 203068 |     6979 |        1186
      20 | 131160 |     1673 |        3634
      20 | 131125 |     1673 |        3632
      20 | 131170 |     3125 |        4585
      20 | 131167 |     1000 |       27836
(7 rows)

Time: 408.344 ms
discourse=# SELECT user_id, id, topic_id, post_number
FROM posts
WHERE raw LIKE '%[uid removed]%' AND
user_id IN  (
SELECT gu.user_id
FROM group_users gu
WHERE group_id IN(
SELECT g.id
FROM groups g
WHERE g.name IN ('admins')
)
);
 user_id |   id   | topic_id | post_number
---------+--------+----------+-------------
      20 | 131276 |     4292 |           1
      20 | 136772 |     2929 |          17
      20 | 203068 |     6979 |        1186
      20 | 131160 |     1673 |        3634
      20 | 131125 |     1673 |        3632
      20 | 131170 |     3125 |        4585
      20 | 131167 |     1000 |       27836
(7 rows)

Time: 398.363 ms

hhaamu · 2015-01-30 Reply Admin

Can I gets an EXPLAIN ANALYZE for both queries?

PJH · 2015-01-30 Reply Admin

Was working on it when you posted... :laughing:

discourse=# explain analyze SELECT p.user_id, p.id, p.topic_id, p.post_number 
FROM posts p
JOIN group_users gu ON p.user_id = gu.user_id
JOIN groups g ON gu.group_id = g.id
WHERE raw LIKE '%[uuid removed]%' AND
g.name IN ('admins');
                                                        QUERY PLAN                                                         
---------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..24412.01 rows=6 width=16) (actual time=428.519..437.167 rows=7 loops=1)
   Join Filter: (gu.user_id = p.user_id)
   ->  Nested Loop  (cost=0.00..107.49 rows=226 width=4) (actual time=0.046..10.108 rows=16 loops=1)
         Join Filter: (gu.group_id = g.id)
         ->  Seq Scan on groups g  (cost=0.00..1.20 rows=1 width=4) (actual time=0.027..0.032 rows=1 loops=1)
               Filter: ((name)::text = 'admins'::text)
         ->  Seq Scan on group_users gu  (cost=0.00..61.13 rows=3613 width=8) (actual time=0.011..5.062 rows=3613 loops=1)
   ->  Materialize  (cost=0.00..24233.38 rows=21 width=16) (actual time=0.306..26.670 rows=12 loops=16)
         ->  Seq Scan on posts p  (cost=0.00..24233.28 rows=21 width=16) (actual time=4.871..426.393 rows=12 loops=1)
               Filter: (raw ~~ '%[uuid removed]%'::text)
 Total runtime: 437.236 ms
(11 rows)
discourse=# explain analyze SELECT user_id, id, topic_id, post_number
FROM posts
WHERE raw LIKE '%[uuid removed]%' AND
user_id IN  (
SELECT gu.user_id
FROM group_users gu
WHERE group_id IN(
SELECT g.id
FROM groups g
WHERE g.name IN ('admins')
)
);
                                                                              QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------
-------
 Nested Loop Semi Join  (cost=7.32..24347.55 rows=19 width=16) (actual time=5.487..438.280 rows=7 loops=1)
   Join Filter: (posts.user_id = gu.user_id)
   ->  Seq Scan on posts  (cost=0.00..24233.28 rows=21 width=16) (actual time=5.305..437.557 rows=12 loops=1)
         Filter: (raw ~~ '%[uuid removed]%'::text)
   ->  Materialize  (cost=7.32..38.96 rows=241 width=4) (actual time=0.010..0.035 rows=13 loops=12)
         ->  Nested Loop  (cost=7.32..37.75 rows=241 width=4) (actual time=0.085..0.164 rows=16 loops=1)
               ->  HashAggregate  (cost=1.20..1.21 rows=1 width=4) (actual time=0.030..0.033 rows=1 loops=1)
                     ->  Seq Scan on groups g  (cost=0.00..1.20 rows=1 width=4) (actual time=0.014..0.017 rows=1 loops=1)
                           Filter: ((name)::text = 'admins'::text)
               ->  Bitmap Heap Scan on group_users gu  (cost=6.12..33.53 rows=241 width=8) (actual time=0.044..0.074 rows=16 loops=1)
                     Recheck Cond: (gu.group_id = g.id)
                     ->  Bitmap Index Scan on index_group_users_on_group_id_and_user_id  (cost=0.00..6.06 rows=241 width=0) (actual time=0.034..0.034 rows=16 lo
ops=1)
                           Index Cond: (gu.group_id = g.id)
 Total runtime: 438.405 ms
(14 rows)

Time: 439.718 ms
discourse=#

hhaamu · 2015-01-30 Reply Admin

That is curious. Three seq scans. I expected only one. You sure you aren't missing an index on group_users (user_id) or group_users (group_id)? You seem to have a composite index on group_users (group_id, user_id) which can't be used for user_id

Yours is using that composite index, mine isn't.

Anyway, this is mostly academic since the LIKE '%UUID%' is the real I/O-eater in those queries.

PJH · 2015-01-30 Reply Admin

hhaamu:
You sure you aren't missing an index on group_users (user_id) or group_users (group_id)?

I'm not missing anything - this isn't my database :laughing: - it's the one this forum runs on and I have no control over the schema. But since you ask..

discourse=# \d+ group_users
                                                         Table "public.group_users"
   Column   |            Type             |                        Modifiers                         | Storage | Stats target | Description
------------+-----------------------------+----------------------------------------------------------+---------+--------------+-------------
 id         | integer                     | not null default nextval('group_users_id_seq'::regclass) | plain   |              |
 group_id   | integer                     | not null                                                 | plain   |              |
 user_id    | integer                     | not null                                                 | plain   |              |
 created_at | timestamp without time zone | not null                                                 | plain   |              |
 updated_at | timestamp without time zone | not null                                                 | plain   |              |
Indexes:
    "group_users_pkey" PRIMARY KEY, btree (id)
    "index_group_users_on_group_id_and_user_id" UNIQUE, btree (group_id, user_id)
Has OIDs: no

discourse=#

hhaamu:
Anyway, this is mostly academic since the LIKE '%UUID%' is the real I/O-eater in those queries.

It surprised me how fast that bit of the query ran when I first started experimenting with the concept.

PJH · 2015-01-30 Reply Admin

Continuing with the pointlessly academic - I added two indices for the columns you mentioned (I'm running off a restored backup - not the live DB.) I'm not convinced it's helped:

discourse=# \d+ group_users
                                                         Table "public.group_users"
   Column   |            Type             |                        Modifiers                         | Storage | Stats target | Description
------------+-----------------------------+----------------------------------------------------------+---------+--------------+-------------
 id         | integer                     | not null default nextval('group_users_id_seq'::regclass) | plain   |              |
 group_id   | integer                     | not null                                                 | plain   |              |
 user_id    | integer                     | not null                                                 | plain   |              |
 created_at | timestamp without time zone | not null                                                 | plain   |              |
 updated_at | timestamp without time zone | not null                                                 | plain   |              |
Indexes:
    "group_users_pkey" PRIMARY KEY, btree (id)
    "index_group_users_on_group_id_and_user_id" UNIQUE, btree (group_id, user_id)
    "index_group_users_on_group_id" btree (group_id)
    "index_group_users_on_user_id" btree (user_id)
Has OIDs: no

discourse=# explain analyze SELECT p.user_id, p.id, p.topic_id, p.post_number
FROM posts p
JOIN group_users gu ON p.user_id = gu.user_id              
JOIN groups g ON gu.group_id = g.id
WHERE raw LIKE '%[uuid removed]%' AND
g.name IN ('admins');
                                                                        QUERY PLAN                                                                        
----------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..24352.10 rows=6 width=16) (actual time=5.464..426.206 rows=7 loops=1)
   Join Filter: (gu.group_id = g.id)
   ->  Seq Scan on groups g  (cost=0.00..1.20 rows=1 width=4) (actual time=0.019..0.024 rows=1 loops=1)
         Filter: ((name)::text = 'admins'::text)
   ->  Nested Loop  (cost=0.00..24349.61 rows=103 width=20) (actual time=5.432..426.020 rows=93 loops=1)
         ->  Seq Scan on posts p  (cost=0.00..24233.28 rows=21 width=16) (actual time=5.413..425.264 rows=12 loops=1)
               Filter: (raw ~~ '%[uuid removed]%'::text)
         ->  Index Scan using index_group_users_on_user_id on group_users gu  (cost=0.00..5.51 rows=2 width=8) (actual time=0.013..0.034 rows=8 loops=12)
               Index Cond: (gu.user_id = p.user_id)
 Total runtime: 426.325 ms
(10 rows)

Time: 429.926 ms
discourse=# explain analyze SELECT user_id, id, topic_id, post_number
FROM posts
WHERE raw LIKE '%[uuid removed]%' AND
user_id IN  (
SELECT gu.user_id
FROM group_users gu
WHERE group_id IN(
SELECT g.id
FROM groups g
WHERE g.name IN ('admins')
)
);
                                                                        QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Semi Join  (cost=7.32..24347.55 rows=19 width=16) (actual time=6.373..405.408 rows=7 loops=1)
   Join Filter: (posts.user_id = gu.user_id)
   ->  Seq Scan on posts  (cost=0.00..24233.28 rows=21 width=16) (actual time=6.150..404.654 rows=12 loops=1)
         Filter: (raw ~~ '%[uuid removed]%'::text)
   ->  Materialize  (cost=7.32..38.96 rows=241 width=4) (actual time=0.013..0.039 rows=13 loops=12)
         ->  Nested Loop  (cost=7.32..37.75 rows=241 width=4) (actual time=0.123..0.207 rows=16 loops=1)
               ->  HashAggregate  (cost=1.20..1.21 rows=1 width=4) (actual time=0.034..0.036 rows=1 loops=1)
                     ->  Seq Scan on groups g  (cost=0.00..1.20 rows=1 width=4) (actual time=0.015..0.018 rows=1 loops=1)
                           Filter: ((name)::text = 'admins'::text)
               ->  Bitmap Heap Scan on group_users gu  (cost=6.12..33.53 rows=241 width=8) (actual time=0.076..0.114 rows=16 loops=1)
                     Recheck Cond: (gu.group_id = g.id)
                     ->  Bitmap Index Scan on index_group_users_on_group_id  (cost=0.00..6.06 rows=241 width=0) (actual time=0.058..0.058 rows=16 loops=1)
                           Index Cond: (gu.group_id = g.id)
 Total runtime: 405.567 ms
(14 rows)

Time: 406.973 ms
discourse=# SELECT p.user_id, p.id, p.topic_id, p.post_number
FROM posts p
JOIN group_users gu ON p.user_id = gu.user_id
JOIN groups g ON gu.group_id = g.id
WHERE raw LIKE '%[uuid removed]%' AND
g.name IN ('admins');
 user_id |   id   | topic_id | post_number
---------+--------+----------+-------------
      20 | 131276 |     4292 |           1
      20 | 136772 |     2929 |          17
      20 | 203068 |     6979 |        1186
      20 | 131160 |     1673 |        3634
      20 | 131125 |     1673 |        3632
      20 | 131170 |     3125 |        4585
      20 | 131167 |     1000 |       27836
(7 rows)

Time: 431.255 ms
discourse=# SELECT user_id, id, topic_id, post_number
FROM posts
WHERE raw LIKE '%[uuid removed]%' AND
user_id IN  (
SELECT gu.user_id
FROM group_users gu
WHERE group_id IN(
SELECT g.id
FROM groups g
WHERE g.name IN ('admins')
)
);
 user_id |   id   | topic_id | post_number
---------+--------+----------+-------------
      20 | 131276 |     4292 |           1
      20 | 136772 |     2929 |          17
      20 | 203068 |     6979 |        1186
      20 | 131160 |     1673 |        3634
      20 | 131125 |     1673 |        3632
      20 | 131170 |     3125 |        4585
      20 | 131167 |     1000 |       27836
(7 rows)

Time: 411.554 ms

Onyx · 2015-01-30 Reply Admin

This kinda makes me want to try and rewrite some of the queries in my dev DB with subqueries rather than joins, just to see if there's a noticeable difference.

PJH · 2015-01-30 Reply Admin

How long do they take to run? Sub-second? Seconds? Minutes?

hhaamu · 2015-01-30 Reply Admin

The plan confuses me. posts is a massive table, right? Postgres should be intelligent enough to only scan the sixteen posts it finds via the join (granted, it thinks that it'll find 226 posts); instead it's seq scanning the whole table.

The groups_users and groups tables seem pretty small (gu having a few thousand rows, groups fewer than a hundred?), neither really benefitting from an index.

Is it missing an index on posts (user_id) then? That's the only reason I can think of right now.

Onyx · 2015-01-30 Reply Admin

PJH:
How long do they take to run? Sub-second? Seconds? Minutes?

Sub-second. Not really looking for a performance boost at this point, I'm just curious.

PJH · 2015-01-30 Reply Admin

hhaamu:
The plan confuses me. posts is a massive table, right?

discourse=# select count(*) from posts;
 count
--------
 212502
(1 row)

Time: 187.287 ms

hhaamu:
Is it missing an index on posts (user_id) then? That's the only reason I can think of right now.

Nope - there's one there...

discourse=# \d+ posts
                                                                Table "public.posts"
         Column          |            Type             |                     Modifiers                      | Storage  | Stats target | Description
-------------------------+-----------------------------+----------------------------------------------------+----------+--------------+-------------
 id                      | integer                     | not null default nextval('posts_id_seq'::regclass) | plain    |              |
 user_id                 | integer                     |                                                    | plain    |              |
 topic_id                | integer                     | not null                                           | plain    |              |
 post_number             | integer                     | not null                                           | plain    |              |
 raw                     | text                        | not null                                           | extended |              |
 cooked                  | text                        | not null                                           | extended |              |
 created_at              | timestamp without time zone | not null                                           | plain    |              |
 updated_at              | timestamp without time zone | not null                                           | plain    |              |
 reply_to_post_number    | integer                     |                                                    | plain    |              |
 reply_count             | integer                     | not null default 0                                 | plain    |              |
 quote_count             | integer                     | not null default 0                                 | plain    |              |
 deleted_at              | timestamp without time zone |                                                    | plain    |              |
 off_topic_count         | integer                     | not null default 0                                 | plain    |              |
 like_count              | integer                     | not null default 0                                 | plain    |              |
 incoming_link_count     | integer                     | not null default 0                                 | plain    |              |
 bookmark_count          | integer                     | not null default 0                                 | plain    |              |
 avg_time                | integer                     |                                                    | plain    |              |
 score                   | double precision            |                                                    | plain    |              |
 reads                   | integer                     | not null default 0                                 | plain    |              |
 post_type               | integer                     | not null default 1                                 | plain    |              |
 vote_count              | integer                     | not null default 0                                 | plain    |              |
 sort_order              | integer                     |                                                    | plain    |              |
 last_editor_id          | integer                     |                                                    | plain    |              |
 hidden                  | boolean                     | not null default false                             | plain    |              |
 hidden_reason_id        | integer                     |                                                    | plain    |              |
 notify_moderators_count | integer                     | not null default 0                                 | plain    |              |
 spam_count              | integer                     | not null default 0                                 | plain    |              |
 illegal_count           | integer                     | not null default 0                                 | plain    |              |
 inappropriate_count     | integer                     | not null default 0                                 | plain    |              |
 last_version_at         | timestamp without time zone | not null                                           | plain    |              |
 user_deleted            | boolean                     | not null default false                             | plain    |              |
 reply_to_user_id        | integer                     |                                                    | plain    |              |
 percent_rank            | double precision            | default 1.0                                        | plain    |              |
 notify_user_count       | integer                     | not null default 0                                 | plain    |              |
 like_score              | integer                     | not null default 0                                 | plain    |              |
 deleted_by_id           | integer                     |                                                    | plain    |              |
 edit_reason             | character varying(255)      |                                                    | extended |              |
 word_count              | integer                     |                                                    | plain    |              |
 version                 | integer                     | not null default 1                                 | plain    |              |
 cook_method             | integer                     | not null default 1                                 | plain    |              |
 wiki                    | boolean                     | not null default false                             | plain    |              |
 baked_at                | timestamp without time zone |                                                    | plain    |              |
 baked_version           | integer                     |                                                    | plain    |              |
 hidden_at               | timestamp without time zone |                                                    | plain    |              |
 self_edits              | integer                     | not null default 0                                 | plain    |              |
 reply_quoted            | boolean                     | not null default false                             | plain    |              |
 via_email               | boolean                     | not null default false                             | plain    |              |
 raw_email               | text                        |                                                    | extended |              |
 public_version          | integer                     | not null default 1                                 | plain    |              |
Indexes:
    "posts_pkey" PRIMARY KEY, btree (id)
    "index_posts_on_topic_id_and_post_number" UNIQUE, btree (topic_id, post_number)
    "idx_posts_created_at_topic_id" btree (created_at, topic_id) WHERE deleted_at IS NULL
    "idx_posts_user_id_deleted_at" btree (user_id) WHERE deleted_at IS NULL
    "index_posts_on_reply_to_post_number" btree (reply_to_post_number)
Has OIDs: no

discourse=#

hhaamu · 2015-01-30 Reply Admin

PJH:
"idx_posts_user_id_deleted_at" btree (user_id) WHERE deleted_at IS NULL

There's your issue. It's a partial index. Pg can't use that.

Try adding p.deleted_at IS NULL to both queries next to the LIKE and see what happens.

PJH · 2015-01-30 Reply Admin

hhaamu:
groups_users and groups tables seem pretty small (gu having a few thousand rows, groups fewer than a hundred?),

discourse=# select count(*) from group_users;
 count
-------
  3613
(1 row)

Time: 1.222 ms
discourse=# select count(*) from groups;
 count
-------
    16
(1 row)

Time: 0.324 ms
discourse=#

groups_users contains stuff like UserX is TL0, TL1 and TL2. For example, for me:

discourse=# select g.name from group_users gu join groups g on g.id=gu.group_id where user_id=20;
        name
--------------------
 admins
 staff
 trust_level_2
 trust_level_0
 trust_level_1
 trust_level_3
 area_gbr
 moderators
 super_sekret_group
(9 rows)

Time: 0.770 ms

discourse=# select id, name, user_count, alias_level, visible from groups;
 id |        name         | user_count | alias_level | visible
----+---------------------+------------+-------------+---------
  0 | everyone            |          0 |           0 | t
 42 | area_gbr            |          7 |        NULL | t
 51 | area_bel            |          2 |           0 | t
 52 | super_sekret_group  |          5 |           0 | f
 41 | area_usa            |         10 |        NULL | t
 50 | bots                |         34 |           0 | t
 43 | area_deu            |          3 |           0 | t
 49 | programmers_testers |         30 |        NULL | t
  1 | admins              |         16 |           0 | t
  2 | moderators          |          9 |           2 | f
  3 | staff               |         18 |           0 | t
 10 | trust_level_0       |       1715 |           2 | f
 11 | trust_level_1       |       1215 |           0 | t
 12 | trust_level_2       |        473 |           0 | t
 13 | trust_level_3       |         72 |           0 | t
 14 | trust_level_4       |          5 |           0 | t
(16 rows)

Time: 0.450 ms

PJH · 2015-01-30 Reply Admin

discourse=# explain analyze SELECT p.user_id, p.id, p.topic_id, p.post_number
discourse-# FROM posts p
discourse-# JOIN group_users gu ON p.user_id = gu.user_id
discourse-# JOIN groups g ON gu.group_id = g.id
discourse-# WHERE raw LIKE '%[uuid omitted]%' AND
discourse-# p.deleted_at IS NULL AND
discourse-# g.name IN ('admins');
                                                                        QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..24352.10 rows=6 width=16) (actual time=4.968..440.987 rows=7 loops=1)
   Join Filter: (gu.group_id = g.id)
   ->  Seq Scan on groups g  (cost=0.00..1.20 rows=1 width=4) (actual time=0.014..0.019 rows=1 loops=1)
         Filter: ((name)::text = 'admins'::text)
   ->  Nested Loop  (cost=0.00..24349.61 rows=103 width=20) (actual time=4.943..440.799 rows=93 loops=1)
         ->  Seq Scan on posts p  (cost=0.00..24233.28 rows=21 width=16) (actual time=4.924..439.995 rows=12 loops=1)
               Filter: ((deleted_at IS NULL) AND (raw ~~ '%[uuid omitted]%'::text))
         ->  Index Scan using index_group_users_on_user_id on group_users gu  (cost=0.00..5.51 rows=2 width=8) (actual time=0.012..0.032 rows=8 loops=12)
               Index Cond: (gu.user_id = p.user_id)
 Total runtime: 441.102 ms
(10 rows)

Time: 442.732 ms
discourse=# SELECT p.user_id, p.id, p.topic_id, p.post_number
discourse-# FROM posts p
discourse-# JOIN group_users gu ON p.user_id = gu.user_id
discourse-# JOIN groups g ON gu.group_id = g.id
discourse-# WHERE raw LIKE '%[uuid omitted]%' AND
discourse-# p.deleted_at IS NULL AND
discourse-# g.name IN ('admins');
 user_id |   id   | topic_id | post_number
---------+--------+----------+-------------
      20 | 131276 |     4292 |           1
      20 | 136772 |     2929 |          17
      20 | 203068 |     6979 |        1186
      20 | 131160 |     1673 |        3634
      20 | 131125 |     1673 |        3632
      20 | 131170 |     3125 |        4585
      20 | 131167 |     1000 |       27836
(7 rows)

Time: 454.986 ms

discourse=# explain analyze SELECT user_id, id, topic_id, post_number
discourse-# FROM posts p
discourse-# WHERE raw LIKE '%[uuid omitted]%' AND
discourse-# p.deleted_at IS NULL AND
discourse-# user_id IN  (
discourse(# SELECT gu.user_id
discourse(# FROM group_users gu
discourse(# WHERE group_id IN(
discourse(# SELECT g.id
discourse(# FROM groups g
discourse(# WHERE g.name IN ('admins')
discourse(# )
discourse(# );
                                                                        QUERY PLAN                                                                         
-----------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Semi Join  (cost=7.32..24347.55 rows=19 width=16) (actual time=5.313..433.193 rows=7 loops=1)
   Join Filter: (p.user_id = gu.user_id)
   ->  Seq Scan on posts p  (cost=0.00..24233.28 rows=21 width=16) (actual time=5.127..432.453 rows=12 loops=1)
         Filter: ((deleted_at IS NULL) AND (raw ~~ '%[uuid omitted]%'::text))
   ->  Materialize  (cost=7.32..38.96 rows=241 width=4) (actual time=0.010..0.035 rows=13 loops=12)
         ->  Nested Loop  (cost=7.32..37.75 rows=241 width=4) (actual time=0.089..0.169 rows=16 loops=1)
               ->  HashAggregate  (cost=1.20..1.21 rows=1 width=4) (actual time=0.028..0.031 rows=1 loops=1)
                     ->  Seq Scan on groups g  (cost=0.00..1.20 rows=1 width=4) (actual time=0.014..0.016 rows=1 loops=1)
                           Filter: ((name)::text = 'admins'::text)
               ->  Bitmap Heap Scan on group_users gu  (cost=6.12..33.53 rows=241 width=8) (actual time=0.048..0.079 rows=16 loops=1)
                     Recheck Cond: (gu.group_id = g.id)
                     ->  Bitmap Index Scan on index_group_users_on_group_id  (cost=0.00..6.06 rows=241 width=0) (actual time=0.038..0.038 rows=16 loops=1)
                           Index Cond: (gu.group_id = g.id)
 Total runtime: 433.317 ms
(14 rows)

Time: 434.701 ms
discourse=# SELECT user_id, id, topic_id, post_number
discourse-# FROM posts p
discourse-# WHERE raw LIKE '%[uuid omitted]%' AND
discourse-# p.deleted_at IS NULL AND
discourse-# user_id IN  (
discourse(# SELECT gu.user_id
discourse(# FROM group_users gu
discourse(# WHERE group_id IN(
discourse(# SELECT g.id
discourse(# FROM groups g
discourse(# WHERE g.name IN ('admins')
discourse(# )
discourse(# );
 user_id |   id   | topic_id | post_number
---------+--------+----------+-------------
      20 | 131276 |     4292 |           1
      20 | 136772 |     2929 |          17
      20 | 203068 |     6979 |        1186
      20 | 131160 |     1673 |        3634
      20 | 131125 |     1673 |        3632
      20 | 131170 |     3125 |        4585
      20 | 131167 |     1000 |       27836
(7 rows)

Time: 417.335 ms
discourse=#

hhaamu · 2015-01-30 Reply Admin

Seriously, what the hell is wrong with that query then. It's still not using the index, even though I'm sure it should. The runtime should then be well under 50 milliseconds when it does.

If you're willing to make the posts (user_id) index non-partial on your test server, try it.

You probably shouldn't even be making partial indices unless more than, say, 50 per cent of your rows are excluded by the condition. (I sure hope the users on this forum aren't deleting half of their posts.)

PJH · 2015-01-30 Reply Admin

hhaamu:
If you're willing to make the posts (user_id) index non-partial on your test server, try it.

I'm presuming there's no issue with having both - I didn't remove the other one..

discourse=# \d+ posts
<snip>
Indexes:
    "posts_pkey" PRIMARY KEY, btree (id)
    "index_posts_on_topic_id_and_post_number" UNIQUE, btree (topic_id, post_number)
    "idx_posts_created_at_topic_id" btree (created_at, topic_id) WHERE deleted_at IS NULL
    "idx_posts_user_id" btree (user_id)
    "idx_posts_user_id_deleted_at" btree (user_id) WHERE deleted_at IS NULL
    "index_posts_on_reply_to_post_number" btree (reply_to_post_number)
Has OIDs: no

discourse=# explain analyze SELECT p.user_id, p.id, p.topic_id, p.post_number
discourse-# FROM posts p
discourse-# JOIN group_users gu ON p.user_id = gu.user_id
discourse-# JOIN groups g ON gu.group_id = g.id
discourse-# WHERE raw LIKE '%[uuid]%' AND
discourse-# p.deleted_at IS NULL AND
discourse-# g.name IN ('admins');
                                                                        QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..24352.10 rows=6 width=16) (actual time=5.401..441.610 rows=7 loops=1)
   Join Filter: (gu.group_id = g.id)
   ->  Seq Scan on groups g  (cost=0.00..1.20 rows=1 width=4) (actual time=0.011..0.016 rows=1 loops=1)
         Filter: ((name)::text = 'admins'::text)
   ->  Nested Loop  (cost=0.00..24349.61 rows=103 width=20) (actual time=5.380..441.427 rows=93 loops=1)
         ->  Seq Scan on posts p  (cost=0.00..24233.28 rows=21 width=16) (actual time=5.349..440.544 rows=12 loops=1)
               Filter: ((deleted_at IS NULL) AND (raw ~~ '%[uuid]%'::text))
         ->  Index Scan using index_group_users_on_user_id on group_users gu  (cost=0.00..5.51 rows=2 width=8) (actual time=0.015..0.043 rows=8 loops=12)
               Index Cond: (gu.user_id = p.user_id)
 Total runtime: 441.695 ms
(10 rows)

Time: 443.248 ms
discourse=# SELECT p.user_id, p.id, p.topic_id, p.post_number
discourse-# FROM posts p
discourse-# JOIN group_users gu ON p.user_id = gu.user_id
discourse-# JOIN groups g ON gu.group_id = g.id
discourse-# WHERE raw LIKE '%[uuid]%' AND
discourse-# p.deleted_at IS NULL AND
discourse-# g.name IN ('admins');
 user_id |   id   | topic_id | post_number
---------+--------+----------+-------------
      20 | 131276 |     4292 |           1
      20 | 136772 |     2929 |          17
      20 | 203068 |     6979 |        1186
      20 | 131160 |     1673 |        3634
      20 | 131125 |     1673 |        3632
      20 | 131170 |     3125 |        4585
      20 | 131167 |     1000 |       27836
(7 rows)

Time: 439.420 ms

discourse=# explain analyze SELECT user_id, id, topic_id, post_number
discourse-# FROM posts p
discourse-# WHERE raw LIKE '%[uuid]%' AND
discourse-# p.deleted_at IS NULL AND
discourse-# user_id IN  (
discourse(# SELECT gu.user_id
discourse(# FROM group_users gu
discourse(# WHERE group_id IN(
discourse(# SELECT g.id
discourse(# FROM groups g
discourse(# WHERE g.name IN ('admins')
discourse(# )
discourse(# );
                                                                        QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Semi Join  (cost=7.32..24347.55 rows=19 width=16) (actual time=5.104..419.205 rows=7 loops=1)
   Join Filter: (p.user_id = gu.user_id)
   ->  Seq Scan on posts p  (cost=0.00..24233.28 rows=21 width=16) (actual time=4.923..418.486 rows=12 loops=1)
         Filter: ((deleted_at IS NULL) AND (raw ~~ '%[uuid]%'::text))
   ->  Materialize  (cost=7.32..38.96 rows=241 width=4) (actual time=0.010..0.035 rows=13 loops=12)
         ->  Nested Loop  (cost=7.32..37.75 rows=241 width=4) (actual time=0.085..0.175 rows=16 loops=1)
               ->  HashAggregate  (cost=1.20..1.21 rows=1 width=4) (actual time=0.026..0.029 rows=1 loops=1)
                     ->  Seq Scan on groups g  (cost=0.00..1.20 rows=1 width=4) (actual time=0.011..0.014 rows=1 loops=1)
                           Filter: ((name)::text = 'admins'::text)
               ->  Bitmap Heap Scan on group_users gu  (cost=6.12..33.53 rows=241 width=8) (actual time=0.047..0.091 rows=16 loops=1)
                     Recheck Cond: (gu.group_id = g.id)
                     ->  Bitmap Index Scan on index_group_users_on_group_id  (cost=0.00..6.06 rows=241 width=0) (actual time=0.036..0.036 rows=16 loops=1)
                           Index Cond: (gu.group_id = g.id)
 Total runtime: 419.305 ms
(14 rows)

Time: 420.674 ms
discourse=# SELECT user_id, id, topic_id, post_number
FROM posts p
WHERE raw LIKE '%[uuid]%' AND
p.deleted_at IS NULL AND
user_id IN  (
SELECT gu.user_id
FROM group_users gu
WHERE group_id IN(
SELECT g.id
FROM groups g
WHERE g.name IN ('admins')
)
);
 user_id |   id   | topic_id | post_number
---------+--------+----------+-------------
      20 | 131276 |     4292 |           1
      20 | 136772 |     2929 |          17
      20 | 203068 |     6979 |        1186
      20 | 131160 |     1673 |        3634
      20 | 131125 |     1673 |        3632
      20 | 131170 |     3125 |        4585
      20 | 131167 |     1000 |       27836
(7 rows)

Time: 415.375 ms

hhaamu:
(I sure hope the users on this forum aren't deleting half of their posts.)

4.5%. 1 in every 23 posts still seems excessive...

discourse=# select count(*) from posts;
 count
--------
 212502
(1 row)

Time: 164.560 ms
discourse=# select count(*) from posts where deleted_at IS NULL;
 count
--------
 211539
(1 row)

Time: 193.552 ms
discourse=# select count(*) from posts where deleted_at IS NOT NULL;
 count
-------
   963
(1 row)

Time: 193.930 ms
discourse=#

boomzilla · 2015-01-30 Reply Admin

PJH:
4.5%. 1 in every 23 posts still seems excessive...

Most of them are probably Paula every morning. But, yeah...that's more than I'd expect these days.

PJH · 2015-01-30 Reply Admin

boomzilla:
Most of them are probably Paula every morning.

Hmm - I don't remember deleting so many...

discourse=# select u.username, count(*) from posts p join users u on u.id=p.user_id where deleted_at is not null group by u.username order by count(*) desc;
       username       | count
----------------------+-------
 PaulaBean            |   268
 SignatureGuy         |   151
 PJH                  |    38
                      |    23
                      |    21
                      |    18
                      |    16
                      |    14
                      |    13
                      |    13
                      |    13
                      |    12
                      |    12
                      |    11
                      |    11
                      |    11
                      |    11
                      |    10
                      |    10
                      |     9
                      |     7
                      |     7
                      |     6
 boomzilla            |     6
                      |     6
                      |     6
                      |     6
                      |     6
                      |     5
                      |     5
                      |     5
                      |     5
                      |     4

boomzilla · 2015-01-30 Reply Admin

:wtf:

PJH:
SignatureGuy | 151

hhaamu · 2015-01-30 Reply Admin

PJH:
I'm presuming there's no issue with having both - I didn't remove the other one..

No, shouldn't be an issue.

This is beyond me. I have no further ideas. You can perhaps try the pgsql-performance mailing list if you're interested. I'd be very interested in knowing why it's seqscanning the posts table even though it's got everything it needs to not to.

There's always the nuclear option of SET enable_seqscan TO off; which disables it for the session. Shirley the EXPLAINs should then use the index?

PJH · 2015-01-30 Reply Admin

Well as has been pointed out, for a whole query that takes < 2 seconds and runs at most once a minute, the whole issue is academic..

I'm not too worried about it.

hhaamu · 2015-01-30 Reply Admin

I noticed you left the p.deleted_at IS NULL in the queries after you created the non-partial index; perhaps it's affecting the plan.

PJH · 2015-01-30 Reply Admin

hhaamu:
I noticed you left the p.deleted_at IS NULL in the queries after you created the non-partial index; perhaps it's affecting the plan.

Not a lot of difference - timings are comparable:

discourse=# explain analyze SELECT p.user_id, p.id, p.topic_id, p.post_number
discourse-# FROM posts p
discourse-# JOIN group_users gu ON p.user_id = gu.user_id
discourse-# JOIN groups g ON gu.group_id = g.id
discourse-# WHERE raw LIKE '%[uuid]%' AND
discourse-# g.name IN ('admins');
                                                                        QUERY PLAN                                                   
                     
-------------------------------------------------------------------------------------------------------------------------------------
---------------------
 Nested Loop  (cost=0.00..24352.10 rows=6 width=16) (actual time=5.507..417.202 rows=7 loops=1)
   Join Filter: (gu.group_id = g.id)
   ->  Seq Scan on groups g  (cost=0.00..1.20 rows=1 width=4) (actual time=0.019..0.025 rows=1 loops=1)
         Filter: ((name)::text = 'admins'::text)
   ->  Nested Loop  (cost=0.00..24349.61 rows=103 width=20) (actual time=5.476..417.019 rows=93 loops=1)
         ->  Seq Scan on posts p  (cost=0.00..24233.28 rows=21 width=16) (actual time=5.456..416.148 rows=12 loops=1)
               Filter: (raw ~~ '%[uuid]%'::text)
         ->  Index Scan using index_group_users_on_user_id on group_users gu  (cost=0.00..5.51 rows=2 width=8) (actual time=0.022..0.
043 rows=8 loops=12)
               Index Cond: (gu.user_id = p.user_id)
 Total runtime: 417.339 ms
(10 rows)

Time: 419.887 ms
discourse=# explain analyze SELECT user_id, id, topic_id, post_number
discourse-# FROM posts p
discourse-# WHERE raw LIKE '%[uuid]%' AND
discourse-# user_id IN  (
discourse(# SELECT gu.user_id
discourse(# FROM group_users gu
discourse(# WHERE group_id IN(
discourse(# SELECT g.id
discourse(# FROM groups g
discourse(# WHERE g.name IN ('admins')));
                                                                        QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------
----------------------
 Nested Loop Semi Join  (cost=7.32..24347.55 rows=19 width=16) (actual time=5.832..409.068 rows=7 loops=1)
   Join Filter: (p.user_id = gu.user_id)
   ->  Seq Scan on posts p  (cost=0.00..24233.28 rows=21 width=16) (actual time=5.667..408.357 rows=12 loops=1)
         Filter: (raw ~~ '%[uuid]%'::text)
   ->  Materialize  (cost=7.32..38.96 rows=241 width=4) (actual time=0.008..0.035 rows=13 loops=12)
         ->  Nested Loop  (cost=7.32..37.75 rows=241 width=4) (actual time=0.067..0.145 rows=16 loops=1)
               ->  HashAggregate  (cost=1.20..1.21 rows=1 width=4) (actual time=0.028..0.030 rows=1 loops=1)
                     ->  Seq Scan on groups g  (cost=0.00..1.20 rows=1 width=4) (actual time=0.012..0.015 rows=1 loops=1)
                           Filter: ((name)::text = 'admins'::text)
               ->  Bitmap Heap Scan on group_users gu  (cost=6.12..33.53 rows=241 width=8) (actual time=0.027..0.057 rows=16 loops=1)
                     Recheck Cond: (gu.group_id = g.id)
                     ->  Bitmap Index Scan on index_group_users_on_group_id  (cost=0.00..6.06 rows=241 width=0) (actual time=0.018..0
.018 rows=16 loops=1)
                           Index Cond: (gu.group_id = g.id)
 Total runtime: 409.184 ms
(14 rows)

Time: 410.682 ms

hhaamu · 2015-01-30 Reply Admin

Try the SET enable_seqscan TO off and EXPLAIN the queries. If it's still not using the index on posts, there's something I'm missing.

PJH · 2015-01-30 Reply Admin

Well it's halved the time taken for your suggested query (213.444ms). And even slightly better for mine(202.261ms).

I won't pretend to even remotely totally understand the query plans - reading up on that now...

discourse=# set enable_seqscan = off;
SET
Time: 0.215 ms
discourse=# explain analyze SELECT p.user_id, p.id, p.topic_id, p.post_number
FROM posts p
JOIN group_users gu ON p.user_id = gu.user_id
JOIN groups g ON gu.group_id = g.id
WHERE raw LIKE '%[uuid]%' AND
g.name IN ('admins');
                                                                          QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------
--------------------------
 Nested Loop  (cost=4.74..100066.66 rows=6 width=16) (actual time=100.559..212.093 rows=7 loops=1)
   ->  Nested Loop  (cost=0.00..286.34 rows=226 width=4) (actual time=16.670..16.827 rows=16 loops=1)
         ->  Index Scan using index_groups_on_name on groups g  (cost=0.00..8.27 rows=1 width=4) (actual time=16.647..16.650 rows=1 l
oops=1)
               Index Cond: ((name)::text = 'admins'::text)
         ->  Index Scan using index_group_users_on_group_id on group_users gu  (cost=0.00..275.06 rows=241 width=8) (actual time=0.01
2..0.102 rows=16 loops=1)
               Index Cond: (gu.group_id = g.id)
   ->  Bitmap Heap Scan on posts p  (cost=4.74..441.24 rows=21 width=16) (actual time=7.725..12.191 rows=0 loops=16)
         Recheck Cond: (p.user_id = gu.user_id)
         Filter: (p.raw ~~ '%[uuid]%'::text)
         ->  Bitmap Index Scan on idx_posts_user_id  (cost=0.00..4.74 rows=429 width=0) (actual time=0.246..0.246 rows=953 loops=16)
               Index Cond: (p.user_id = gu.user_id)
 Total runtime: 212.189 ms
(12 rows)

Time: 213.845 ms
discourse=# explain analyze SELECT user_id, id, topic_id, post_number
FROM posts p
WHERE raw LIKE '%[uuid]%' AND
user_id IN  (
SELECT gu.user_id
FROM group_users gu
WHERE group_id IN(
SELECT g.id
FROM groups g
WHERE g.name IN ('admins')));
                                                                        QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------
----------------------
 Nested Loop  (cost=52.10..138569.39 rows=19 width=16) (actual time=23.368..186.358 rows=7 loops=1)
   ->  HashAggregate  (cost=45.42..47.83 rows=241 width=4) (actual time=0.175..0.227 rows=16 loops=1)
         ->  Nested Loop  (cost=14.39..44.82 rows=241 width=4) (actual time=0.068..0.144 rows=16 loops=1)
               ->  HashAggregate  (cost=8.27..8.28 rows=1 width=4) (actual time=0.024..0.025 rows=1 loops=1)
                     ->  Index Scan using index_groups_on_name on groups g  (cost=0.00..8.27 rows=1 width=4) (actual time=0.012..0.01
5 rows=1 loops=1)
                           Index Cond: ((name)::text = 'admins'::text)
               ->  Bitmap Heap Scan on group_users gu  (cost=6.12..33.53 rows=241 width=8) (actual time=0.035..0.067 rows=16 loops=1)
                     Recheck Cond: (gu.group_id = g.id)
                     ->  Bitmap Index Scan on index_group_users_on_group_id  (cost=0.00..6.06 rows=241 width=0) (actual time=0.027..0
.027 rows=16 loops=1)
                           Index Cond: (gu.group_id = g.id)
   ->  Bitmap Heap Scan on posts p  (cost=6.67..574.52 rows=21 width=16) (actual time=7.452..11.621 rows=0 loops=16)
         Recheck Cond: (p.user_id = gu.user_id)
         Filter: (p.raw ~~ '%[uuid]%'::text)
         ->  Bitmap Index Scan on idx_posts_user_id  (cost=0.00..6.67 rows=429 width=0) (actual time=0.232..0.232 rows=953 loops=16)
               Index Cond: (p.user_id = gu.user_id)
 Total runtime: 186.497 ms
(16 rows)

Time: 188.304 ms
discourse=#

antiquarian · 2015-01-30 Reply Admin

Maciejasjmj:
It kinda works for simple matches, but then you try something even a bit more complex and inevitably end up with what looks like the results of banging your head on your keyboard for a prolonged period of time.

That would be your cue to put regular expressions away and get a real parser.

FrostCat:
Facebook is FULL of people who do that, or something similar, only with simpler problems like "solve 0 * 7 + 7 - 7"

That's just one more reason to stay away from Facebook.

accalia · 2015-01-30 Reply Admin

FrostCat:
0 * 7 + 7 - 7

0

2 4 6 4 8 16 8 10 24

those three numbers follow a pattern rule. what is the rule?

hhaamu · 2015-01-30 Reply Admin

PJH:
I won't pretend to even remotely totally understand the query plans - reading up on that now...

They look okay now. The posts table part of the query is now using the index, as it should.

Postgres thinks the nested loop variant takes 100k pagefetches [1] while the seqscan on posts takes only 24k. But it's not telling me why. The cost estimate just jumps to 100k ; its direct sub-items have costs of 286-ish and 441-ish.

...Why is it using the nested loop strategy? Postgres usually prefers hash joins, doesn't it?

(Answering to self: maybe there are too few rows in the intermediate resultset for a hash join to be beneficial, or maybe the resultset isn't wide enough. You can disable the relevant enable_*, enable_nestloop in this case, and see what the planner thinks as a backup plan and how much it costs...)

Anyway, the thing I was missing is now obvious, I think, was that you're searching all posts by yourself. Postgres says this will be [the amount of rows returned from g_u] * [average amount of posts per user]

As it thinks it'll have to find 241 users' information (In reality:16 ids, which I think are the same? user_id = 20?). So 241 users * avg_postcount means it should have to do the posts (user_id) index look-up a sizeable percentage of the rows...

On second thought, that can't be it. It's looking up fewer than a thousand rows. Half a per cent of the whole table. Perfectly reasonable to use an index then. So scrap this.

[1]: Well, not really 'pagefetches'; it's measured on an arbitrary scale used by the planner. 'Planner unit cost' perhaps?

powerlord · 2015-01-30 Reply Admin

Each row is the first number times the column number, but the middle column is being expressed in hexadecimal instead of decimal.

:stuck_out_tongue:

accalia · 2015-01-30 Reply Admin

........ ok that does match, but no.....

also i really shouldn't play guessing games here so......

https://www.youtube.com/watch?v=vKA4w2O61Xo

powerlord · 2015-01-30 Reply Admin

Can't watch it from work as I don't have headphones. I'll have to watch it later.

Although I will say I tried a few different things when doing this, but either the 10 or the 16/24 seemed to block the various things I tried. So I figured I'd go with the tongue-in-cheek guess. :smiley:

VinDuv · 2015-01-30 Reply Admin

boomzilla:
:wtf:
PJH:
SignatureGuy | 151

It was for this experiment: http://what.thedailywtf.com/t/high-unread-count/3597

@SignatureGuy was able to successfuly generate around 3000 posts but was only able to remove 150 of them before running into some DiscoLimits.

SignatureGuy · 2015-01-30 Reply Admin

I agree with whatever @VinDuv posted just above.<t7649p89>

chubertdev · 2015-01-30 Reply Admin

accalia:
0

2 4 6 4 8 16 8 10 24

those three numbers follow a pattern rule. what is the rule?

You posted them.

For the database discussion, isn't this why you can move posts?

accalia · 2015-01-30 Reply Admin

chubertdev:
, isn't this why you can move posts?

I can move posts?

News to me!

chubertdev · 2015-01-30 Reply Admin

accalia:
I can move posts?
News to me!

heh

chubertdev:
For the database discussion

kupfernigk · 2015-01-30 Reply Admin

Maciejasjmj:
We should've moved to RPN a long time ago, I say

We did, but then the idiocracy dragged us back again.

BaconBits · 2015-01-30 Reply Admin

I would wonder if it's the join order that's confusing the query engine. The nested IN statements kind of controls the JOIN order, and the subquery WHERE clauses act like JOIN conditions, while the alternative query does all the JOINs then filters with the WHERE clause.

SELECT p.user_id, p.id, p.topic_id, p.post_number
FROM posts p
JOIN (group_users gu 
    JOIN groups g ON gu.group_id = g.id)
    ON p.user_id = gu.user_id
WHERE raw LIKE '%[uuid]%' 
    AND g.name IN ('admins');

Or even:

SELECT p.user_id, p.id, p.topic_id, p.post_number
FROM (groups g
    JOIN group_users gu
        ON  gu.group_id = g.id
        AND g.name IN ('admin'))
JOIN posts p
    ON p.user_id = gu.user_id
WHERE p.raw LIKE '%[uuid]%'

I also wonder how this performs, as EXISTS can outperform IN, although it's only usual to do so when the subquery is very large:

SELECT p.user_id, p.id, p.topic_id, p.post_number
FROM posts p
WHERE EXISTS (SELECT 1 
    FROM groups g
    JOIN group_users gu
        ON gu.group_id = g.id
    WHERE g.name IN ('admin')
        AND gu.user_id = p.user_id)

Theoretically, of course, all these queries should have the same query plan if the query planner is smart enough.

coyo · 2015-01-30 Reply Admin

Regexes are worth taking the time to understand. It's not some innate ability that people who read them have, its simply taking the time to learn them.

Yes, they can be abused (they really shouldn't be hundreds of characters long). If it fits on a line, any professional should be able to figure it out.

I suppose I translate things in my mind too much, but I see "COMPLICATED BAD. GROG NOT UNDERSTAND. GROG ANGRY." in your post.

Maciejasjmj · 2015-01-30 Reply Admin

coyo:
Regexes are worth taking the time to understand.

Maybe they are, I'm not denying it. But the point is, they wouldn't take nearly as much time to understand if somebody didn't seriously, badly, absolutely fucked up the syntax.

coyo:
I suppose I translate things in my mind too much, but I see "COMPLICATED BAD. GROG NOT UNDERSTAND. GROG ANGRY." in your post.

Complicated is bad. If you can have the same thing, but simpler, you'd obviously choose that over the more complicated solution.

kupfernigk · 2015-01-31 Reply Admin

Maciejasjmj:
Complicated is bad. If you can have the same thing, but simpler, you'd obviously choose that over the more complicated solution.

I agree. I have always avoided regexes if there was a simpler, more understandable way of doing things. But then I have never felt the need to obscure my code as an exercise in job protection.

One of my least favorite programmers used to use regexes to validate input numeric dates. In Java. When it was pointed out to him that February has 28 or 29 days depending on the yearnumber, he eventually produced an eyewatering regex to handle it (it will fail in 2100). Mind you, he was also one of those fedora wearing people who believe that Exceptions are bad, women can't program, and customers are always wrong.

dkf · 2015-01-31 Reply Admin

kupfernigk:
I have always avoided regexes if there was a simpler, more understandable way of doing things.

They're good for what they're designed to do: describing matchers for Regular Languages. It's when people use them to do other things that the WTFs mount up.

Thus, if you were building a date parser from scratch, you'd use REs as one of the first steps, to extract the interesting fields (tokenizers are big RE use cases). It wouldn't tell you whether the date was valid, but it would let you get rid of a lot of obviously invalid stuff rapidly. Of course, most languages have a library (possibly built-in) that handles all that stuff for you and it would indicate a WTF to not use it — though which WTF it indicates isn't always simple — but that library call may well have been designed to use REs internally.

kupfernigk · 2015-02-01 Reply Admin

dkf:
but that library call may well have been designed to use REs internally.

Exactly. And that library has been produced by more than one person, thoroughly reviewed and tested. So this is one wheel nobody should be reinventing to show off their 1337 regex skills. (Deliberate use of obsolescent slang for emphasis.)

obeselymorbid · 2015-02-02 Reply Admin

PJH:
4.5[s]%[/s] **‰** . 1 in every [s]23[/s] **221** posts still seems **mildly** excessive...

FTFY 0.45% (inb4 WTF is ‰)

obeselymorbid · 2015-02-02 Reply Admin

kupfernigk:
Mind you, he was also one of those fedora wearing people who believe that Exceptions are bad, women can't program, and customers are always wrong.

Well, 2 out of 3 correct is not that bad. </troll> <Also I'll let you guess my thoughts on which one of those three is wrong.>

Sylver · 2015-02-02 Reply Admin

It's a memorial to the brave souls who attempt to clean up this purposefully obfuscated mess.

A Shining Perl

Leave a comment on “A Shining Perl”