• (disco)

    It occurs to me that nobody has mentioned the WTF in the writing (not the WTF in the subject of the writing, but in the writing itself).

    The article suggests strongly that Perl is a write-only language. This assertion is ridiculous. Anybody with an ounce of knowledge knows that while Perl isn't easy to read, its illegibility is a mere grain of sand in the path of knowledge when compared to the Himalayan heights of the illegibility of APL.

    Of course, APL code is all Greek to me... (APL's name is also the the lamest TLA I've encountered so far.)

  • (disco) in reply to Steve_The_Cynic
    Steve_The_Cynic:
    APL's name is also the the lamest TLA I've encountered so far

    http://en.wikipedia.org/wiki/APL_%28programming_language%29

    Really? Could they honestly not come up with anythinng better than that?

  • (disco) in reply to PJH

    TBH, that query sucks. You need to level-up in SQL a couple more times. It's written with the procedural-programming mindset, and the performance is suboptimal.

    The problem?

    SELECT ...
        WHERE user_id IN  (
            SELECT ...
            WHERE group_id IN(
                SELECT ...
                WHERE g.name IN ('admins')
            )
        )
    ),
    

    You get my point? The SQL engine has to disentangle that query by executing each of the sub-selects in sequential order, innermost first. I don't think the row estimator works that great with that pattern either. (It may be more intelligent nowadays; I've only experience with Postgres 8.4)

    You're supposed to write each of those as joins. If it's a 1:n relation, it might be tricky, but n:1 is trivial. Something like this:

    SELECT p.user_id, p.id, p.topic_id, p.post_number
    	FROM posts p
    	JOIN groups_users gu ON p.user_id = gu.user_id
    	JOIN groups g ON gu.group_id = g.id
    	WHERE raw LIKE '%[UUID omitted for obvious reasons]%' AND
    		g.name IN ('admins');
    

    Of course, that's likely to give extra/duplicate-ish rows if we're dealing with 1:n relations, thanks to group theory. Whether that's a problem, I don't know. It depends on the code.

    But anyway, the query looks a lot nicer to me when written as joins.


    Filed under: Armchair SQL non-expert, I'm probably in the wrong thread for this anyway

  • (disco) in reply to chubertdev

    And then in computer studies, the teacher asks someone to evaluate 0.0×log(sin(0)).

  • (disco) in reply to hhaamu
    hhaamu:
    TBH, that query sucks. You need to level-up in SQL a couple more times. It's written with the procedural-programming mindset, and the performance is suboptimal.

    [suggested reworking of query]

    Uh-huh. Let me know how that works out for you.

    Meanwhile, with a real dataset, your suggested query runs slower than mine - not that the actual time taken is a problem since it's less than a second (and the entire query takes less than 2) for a query that's run at most once a minute...

    Your version first, followed by mine:

    discourse=# SELECT p.user_id, p.id, p.topic_id, p.post_number
    discourse-# FROM posts p
    discourse-# JOIN group_users gu ON p.user_id = gu.user_id
    discourse-# JOIN groups g ON gu.group_id = g.id
    discourse-# WHERE raw LIKE '%[uid removed]%' AND
    discourse-# g.name IN ('admins');
     user_id |   id   | topic_id | post_number
    ---------+--------+----------+-------------
          20 | 131276 |     4292 |           1
          20 | 136772 |     2929 |          17
          20 | 203068 |     6979 |        1186
          20 | 131160 |     1673 |        3634
          20 | 131125 |     1673 |        3632
          20 | 131170 |     3125 |        4585
          20 | 131167 |     1000 |       27836
    (7 rows)
    
    Time: 442.626 ms
    discourse=# SELECT user_id, id, topic_id, post_number
    discourse-# FROM posts
    discourse-# WHERE raw LIKE '%[uid removed]%' AND
    discourse-# user_id IN  (
    discourse(# SELECT gu.user_id
    discourse(# FROM group_users gu
    discourse(# WHERE group_id IN(
    discourse(# SELECT g.id
    discourse(# FROM groups g
    discourse(# WHERE g.name IN ('admins')
    discourse(# )
    discourse(# );
     user_id |   id   | topic_id | post_number
    ---------+--------+----------+-------------
          20 | 131276 |     4292 |           1
          20 | 136772 |     2929 |          17
          20 | 203068 |     6979 |        1186
          20 | 131160 |     1673 |        3634
          20 | 131125 |     1673 |        3632
          20 | 131170 |     3125 |        4585
          20 | 131167 |     1000 |       27836
    (7 rows)
    
    Time: 427.125 ms
    

    And a repeat of both in case you might think caching might be an issue:

    discourse=# SELECT p.user_id, p.id, p.topic_id, p.post_number
    FROM posts p
    JOIN group_users gu ON p.user_id = gu.user_id
    JOIN groups g ON gu.group_id = g.id
    WHERE raw LIKE '%[uid removed]%' AND
    g.name IN ('admins');
     user_id |   id   | topic_id | post_number
    ---------+--------+----------+-------------
          20 | 131276 |     4292 |           1
          20 | 136772 |     2929 |          17
          20 | 203068 |     6979 |        1186
          20 | 131160 |     1673 |        3634
          20 | 131125 |     1673 |        3632
          20 | 131170 |     3125 |        4585
          20 | 131167 |     1000 |       27836
    (7 rows)
    
    Time: 408.344 ms
    discourse=# SELECT user_id, id, topic_id, post_number
    FROM posts
    WHERE raw LIKE '%[uid removed]%' AND
    user_id IN  (
    SELECT gu.user_id
    FROM group_users gu
    WHERE group_id IN(
    SELECT g.id
    FROM groups g
    WHERE g.name IN ('admins')
    )
    );
     user_id |   id   | topic_id | post_number
    ---------+--------+----------+-------------
          20 | 131276 |     4292 |           1
          20 | 136772 |     2929 |          17
          20 | 203068 |     6979 |        1186
          20 | 131160 |     1673 |        3634
          20 | 131125 |     1673 |        3632
          20 | 131170 |     3125 |        4585
          20 | 131167 |     1000 |       27836
    (7 rows)
    
    Time: 398.363 ms
    
  • (disco) in reply to PJH

    Can I gets an EXPLAIN ANALYZE for both queries?

  • (disco) in reply to hhaamu

    Was working on it when you posted... :laughing:

    discourse=# explain analyze SELECT p.user_id, p.id, p.topic_id, p.post_number 
    FROM posts p
    JOIN group_users gu ON p.user_id = gu.user_id
    JOIN groups g ON gu.group_id = g.id
    WHERE raw LIKE '%[uuid removed]%' AND
    g.name IN ('admins');
                                                            QUERY PLAN                                                         
    ---------------------------------------------------------------------------------------------------------------------------
     Nested Loop  (cost=0.00..24412.01 rows=6 width=16) (actual time=428.519..437.167 rows=7 loops=1)
       Join Filter: (gu.user_id = p.user_id)
       ->  Nested Loop  (cost=0.00..107.49 rows=226 width=4) (actual time=0.046..10.108 rows=16 loops=1)
             Join Filter: (gu.group_id = g.id)
             ->  Seq Scan on groups g  (cost=0.00..1.20 rows=1 width=4) (actual time=0.027..0.032 rows=1 loops=1)
                   Filter: ((name)::text = 'admins'::text)
             ->  Seq Scan on group_users gu  (cost=0.00..61.13 rows=3613 width=8) (actual time=0.011..5.062 rows=3613 loops=1)
       ->  Materialize  (cost=0.00..24233.38 rows=21 width=16) (actual time=0.306..26.670 rows=12 loops=16)
             ->  Seq Scan on posts p  (cost=0.00..24233.28 rows=21 width=16) (actual time=4.871..426.393 rows=12 loops=1)
                   Filter: (raw ~~ '%[uuid removed]%'::text)
     Total runtime: 437.236 ms
    (11 rows)
    discourse=# explain analyze SELECT user_id, id, topic_id, post_number
    FROM posts
    WHERE raw LIKE '%[uuid removed]%' AND
    user_id IN  (
    SELECT gu.user_id
    FROM group_users gu
    WHERE group_id IN(
    SELECT g.id
    FROM groups g
    WHERE g.name IN ('admins')
    )
    );
                                                                                  QUERY PLAN
    
    ----------------------------------------------------------------------------------------------------------------------------------------------------------------
    -------
     Nested Loop Semi Join  (cost=7.32..24347.55 rows=19 width=16) (actual time=5.487..438.280 rows=7 loops=1)
       Join Filter: (posts.user_id = gu.user_id)
       ->  Seq Scan on posts  (cost=0.00..24233.28 rows=21 width=16) (actual time=5.305..437.557 rows=12 loops=1)
             Filter: (raw ~~ '%[uuid removed]%'::text)
       ->  Materialize  (cost=7.32..38.96 rows=241 width=4) (actual time=0.010..0.035 rows=13 loops=12)
             ->  Nested Loop  (cost=7.32..37.75 rows=241 width=4) (actual time=0.085..0.164 rows=16 loops=1)
                   ->  HashAggregate  (cost=1.20..1.21 rows=1 width=4) (actual time=0.030..0.033 rows=1 loops=1)
                         ->  Seq Scan on groups g  (cost=0.00..1.20 rows=1 width=4) (actual time=0.014..0.017 rows=1 loops=1)
                               Filter: ((name)::text = 'admins'::text)
                   ->  Bitmap Heap Scan on group_users gu  (cost=6.12..33.53 rows=241 width=8) (actual time=0.044..0.074 rows=16 loops=1)
                         Recheck Cond: (gu.group_id = g.id)
                         ->  Bitmap Index Scan on index_group_users_on_group_id_and_user_id  (cost=0.00..6.06 rows=241 width=0) (actual time=0.034..0.034 rows=16 lo
    ops=1)
                               Index Cond: (gu.group_id = g.id)
     Total runtime: 438.405 ms
    (14 rows)
    
    Time: 439.718 ms
    discourse=#
    
  • (disco) in reply to PJH

    That is curious. Three seq scans. I expected only one. You sure you aren't missing an index on group_users (user_id) or group_users (group_id)? You seem to have a composite index on group_users (group_id, user_id) which can't be used for user_id

    Yours is using that composite index, mine isn't.

    Anyway, this is mostly academic since the LIKE '%UUID%' is the real I/O-eater in those queries.

  • (disco) in reply to hhaamu
    hhaamu:
    You sure you aren't missing an index on group_users (user_id) or group_users (group_id)?

    I'm not missing anything - this isn't my database :laughing: - it's the one this forum runs on and I have no control over the schema. But since you ask..

    discourse=# \d+ group_users
                                                             Table "public.group_users"
       Column   |            Type             |                        Modifiers                         | Storage | Stats target | Description
    ------------+-----------------------------+----------------------------------------------------------+---------+--------------+-------------
     id         | integer                     | not null default nextval('group_users_id_seq'::regclass) | plain   |              |
     group_id   | integer                     | not null                                                 | plain   |              |
     user_id    | integer                     | not null                                                 | plain   |              |
     created_at | timestamp without time zone | not null                                                 | plain   |              |
     updated_at | timestamp without time zone | not null                                                 | plain   |              |
    Indexes:
        "group_users_pkey" PRIMARY KEY, btree (id)
        "index_group_users_on_group_id_and_user_id" UNIQUE, btree (group_id, user_id)
    Has OIDs: no
    
    discourse=#
    
    hhaamu:
    Anyway, this is mostly academic since the LIKE '%UUID%' is the real I/O-eater in those queries.
    It surprised me how fast that bit of the query ran when I first started experimenting with the concept.
  • (disco)

    Continuing with the pointlessly academic - I added two indices for the columns you mentioned (I'm running off a restored backup - not the live DB.) I'm not convinced it's helped:

    discourse=# \d+ group_users
                                                             Table "public.group_users"
       Column   |            Type             |                        Modifiers                         | Storage | Stats target | Description
    ------------+-----------------------------+----------------------------------------------------------+---------+--------------+-------------
     id         | integer                     | not null default nextval('group_users_id_seq'::regclass) | plain   |              |
     group_id   | integer                     | not null                                                 | plain   |              |
     user_id    | integer                     | not null                                                 | plain   |              |
     created_at | timestamp without time zone | not null                                                 | plain   |              |
     updated_at | timestamp without time zone | not null                                                 | plain   |              |
    Indexes:
        "group_users_pkey" PRIMARY KEY, btree (id)
        "index_group_users_on_group_id_and_user_id" UNIQUE, btree (group_id, user_id)
        "index_group_users_on_group_id" btree (group_id)
        "index_group_users_on_user_id" btree (user_id)
    Has OIDs: no
    
    discourse=# explain analyze SELECT p.user_id, p.id, p.topic_id, p.post_number
    FROM posts p
    JOIN group_users gu ON p.user_id = gu.user_id              
    JOIN groups g ON gu.group_id = g.id
    WHERE raw LIKE '%[uuid removed]%' AND
    g.name IN ('admins');
                                                                            QUERY PLAN                                                                        
    ----------------------------------------------------------------------------------------------------------------------------------------------------------
     Nested Loop  (cost=0.00..24352.10 rows=6 width=16) (actual time=5.464..426.206 rows=7 loops=1)
       Join Filter: (gu.group_id = g.id)
       ->  Seq Scan on groups g  (cost=0.00..1.20 rows=1 width=4) (actual time=0.019..0.024 rows=1 loops=1)
             Filter: ((name)::text = 'admins'::text)
       ->  Nested Loop  (cost=0.00..24349.61 rows=103 width=20) (actual time=5.432..426.020 rows=93 loops=1)
             ->  Seq Scan on posts p  (cost=0.00..24233.28 rows=21 width=16) (actual time=5.413..425.264 rows=12 loops=1)
                   Filter: (raw ~~ '%[uuid removed]%'::text)
             ->  Index Scan using index_group_users_on_user_id on group_users gu  (cost=0.00..5.51 rows=2 width=8) (actual time=0.013..0.034 rows=8 loops=12)
                   Index Cond: (gu.user_id = p.user_id)
     Total runtime: 426.325 ms
    (10 rows)
    
    Time: 429.926 ms
    discourse=# explain analyze SELECT user_id, id, topic_id, post_number
    FROM posts
    WHERE raw LIKE '%[uuid removed]%' AND
    user_id IN  (
    SELECT gu.user_id
    FROM group_users gu
    WHERE group_id IN(
    SELECT g.id
    FROM groups g
    WHERE g.name IN ('admins')
    )
    );
                                                                            QUERY PLAN
    -----------------------------------------------------------------------------------------------------------------------------------------------------------
     Nested Loop Semi Join  (cost=7.32..24347.55 rows=19 width=16) (actual time=6.373..405.408 rows=7 loops=1)
       Join Filter: (posts.user_id = gu.user_id)
       ->  Seq Scan on posts  (cost=0.00..24233.28 rows=21 width=16) (actual time=6.150..404.654 rows=12 loops=1)
             Filter: (raw ~~ '%[uuid removed]%'::text)
       ->  Materialize  (cost=7.32..38.96 rows=241 width=4) (actual time=0.013..0.039 rows=13 loops=12)
             ->  Nested Loop  (cost=7.32..37.75 rows=241 width=4) (actual time=0.123..0.207 rows=16 loops=1)
                   ->  HashAggregate  (cost=1.20..1.21 rows=1 width=4) (actual time=0.034..0.036 rows=1 loops=1)
                         ->  Seq Scan on groups g  (cost=0.00..1.20 rows=1 width=4) (actual time=0.015..0.018 rows=1 loops=1)
                               Filter: ((name)::text = 'admins'::text)
                   ->  Bitmap Heap Scan on group_users gu  (cost=6.12..33.53 rows=241 width=8) (actual time=0.076..0.114 rows=16 loops=1)
                         Recheck Cond: (gu.group_id = g.id)
                         ->  Bitmap Index Scan on index_group_users_on_group_id  (cost=0.00..6.06 rows=241 width=0) (actual time=0.058..0.058 rows=16 loops=1)
                               Index Cond: (gu.group_id = g.id)
     Total runtime: 405.567 ms
    (14 rows)
    
    Time: 406.973 ms
    discourse=# SELECT p.user_id, p.id, p.topic_id, p.post_number
    FROM posts p
    JOIN group_users gu ON p.user_id = gu.user_id
    JOIN groups g ON gu.group_id = g.id
    WHERE raw LIKE '%[uuid removed]%' AND
    g.name IN ('admins');
     user_id |   id   | topic_id | post_number
    ---------+--------+----------+-------------
          20 | 131276 |     4292 |           1
          20 | 136772 |     2929 |          17
          20 | 203068 |     6979 |        1186
          20 | 131160 |     1673 |        3634
          20 | 131125 |     1673 |        3632
          20 | 131170 |     3125 |        4585
          20 | 131167 |     1000 |       27836
    (7 rows)
    
    Time: 431.255 ms
    discourse=# SELECT user_id, id, topic_id, post_number
    FROM posts
    WHERE raw LIKE '%[uuid removed]%' AND
    user_id IN  (
    SELECT gu.user_id
    FROM group_users gu
    WHERE group_id IN(
    SELECT g.id
    FROM groups g
    WHERE g.name IN ('admins')
    )
    );
     user_id |   id   | topic_id | post_number
    ---------+--------+----------+-------------
          20 | 131276 |     4292 |           1
          20 | 136772 |     2929 |          17
          20 | 203068 |     6979 |        1186
          20 | 131160 |     1673 |        3634
          20 | 131125 |     1673 |        3632
          20 | 131170 |     3125 |        4585
          20 | 131167 |     1000 |       27836
    (7 rows)
    
    Time: 411.554 ms
    
  • (disco) in reply to PJH

    This kinda makes me want to try and rewrite some of the queries in my dev DB with subqueries rather than joins, just to see if there's a noticeable difference.

  • (disco) in reply to Onyx

    How long do they take to run? Sub-second? Seconds? Minutes?

  • (disco) in reply to PJH

    The plan confuses me. posts is a massive table, right? Postgres should be intelligent enough to only scan the sixteen posts it finds via the join (granted, it thinks that it'll find 226 posts); instead it's seq scanning the whole table.

    The groups_users and groups tables seem pretty small (gu having a few thousand rows, groups fewer than a hundred?), neither really benefitting from an index.

    Is it missing an index on posts (user_id) then? That's the only reason I can think of right now.

  • (disco) in reply to PJH
    PJH:
    How long do they take to run? Sub-second? Seconds? Minutes?

    Sub-second. Not really looking for a performance boost at this point, I'm just curious.

  • (disco) in reply to hhaamu
    hhaamu:
    The plan confuses me. posts is a massive table, right?
    discourse=# select count(*) from posts;
     count
    --------
     212502
    (1 row)
    
    Time: 187.287 ms
    
    hhaamu:
    Is it missing an index on posts (user_id) then? That's the only reason I can think of right now.

    Nope - there's one there...

    discourse=# \d+ posts
                                                                    Table "public.posts"
             Column          |            Type             |                     Modifiers                      | Storage  | Stats target | Description
    -------------------------+-----------------------------+----------------------------------------------------+----------+--------------+-------------
     id                      | integer                     | not null default nextval('posts_id_seq'::regclass) | plain    |              |
     user_id                 | integer                     |                                                    | plain    |              |
     topic_id                | integer                     | not null                                           | plain    |              |
     post_number             | integer                     | not null                                           | plain    |              |
     raw                     | text                        | not null                                           | extended |              |
     cooked                  | text                        | not null                                           | extended |              |
     created_at              | timestamp without time zone | not null                                           | plain    |              |
     updated_at              | timestamp without time zone | not null                                           | plain    |              |
     reply_to_post_number    | integer                     |                                                    | plain    |              |
     reply_count             | integer                     | not null default 0                                 | plain    |              |
     quote_count             | integer                     | not null default 0                                 | plain    |              |
     deleted_at              | timestamp without time zone |                                                    | plain    |              |
     off_topic_count         | integer                     | not null default 0                                 | plain    |              |
     like_count              | integer                     | not null default 0                                 | plain    |              |
     incoming_link_count     | integer                     | not null default 0                                 | plain    |              |
     bookmark_count          | integer                     | not null default 0                                 | plain    |              |
     avg_time                | integer                     |                                                    | plain    |              |
     score                   | double precision            |                                                    | plain    |              |
     reads                   | integer                     | not null default 0                                 | plain    |              |
     post_type               | integer                     | not null default 1                                 | plain    |              |
     vote_count              | integer                     | not null default 0                                 | plain    |              |
     sort_order              | integer                     |                                                    | plain    |              |
     last_editor_id          | integer                     |                                                    | plain    |              |
     hidden                  | boolean                     | not null default false                             | plain    |              |
     hidden_reason_id        | integer                     |                                                    | plain    |              |
     notify_moderators_count | integer                     | not null default 0                                 | plain    |              |
     spam_count              | integer                     | not null default 0                                 | plain    |              |
     illegal_count           | integer                     | not null default 0                                 | plain    |              |
     inappropriate_count     | integer                     | not null default 0                                 | plain    |              |
     last_version_at         | timestamp without time zone | not null                                           | plain    |              |
     user_deleted            | boolean                     | not null default false                             | plain    |              |
     reply_to_user_id        | integer                     |                                                    | plain    |              |
     percent_rank            | double precision            | default 1.0                                        | plain    |              |
     notify_user_count       | integer                     | not null default 0                                 | plain    |              |
     like_score              | integer                     | not null default 0                                 | plain    |              |
     deleted_by_id           | integer                     |                                                    | plain    |              |
     edit_reason             | character varying(255)      |                                                    | extended |              |
     word_count              | integer                     |                                                    | plain    |              |
     version                 | integer                     | not null default 1                                 | plain    |              |
     cook_method             | integer                     | not null default 1                                 | plain    |              |
     wiki                    | boolean                     | not null default false                             | plain    |              |
     baked_at                | timestamp without time zone |                                                    | plain    |              |
     baked_version           | integer                     |                                                    | plain    |              |
     hidden_at               | timestamp without time zone |                                                    | plain    |              |
     self_edits              | integer                     | not null default 0                                 | plain    |              |
     reply_quoted            | boolean                     | not null default false                             | plain    |              |
     via_email               | boolean                     | not null default false                             | plain    |              |
     raw_email               | text                        |                                                    | extended |              |
     public_version          | integer                     | not null default 1                                 | plain    |              |
    Indexes:
        "posts_pkey" PRIMARY KEY, btree (id)
        "index_posts_on_topic_id_and_post_number" UNIQUE, btree (topic_id, post_number)
        "idx_posts_created_at_topic_id" btree (created_at, topic_id) WHERE deleted_at IS NULL
        "idx_posts_user_id_deleted_at" btree (user_id) WHERE deleted_at IS NULL
        "index_posts_on_reply_to_post_number" btree (reply_to_post_number)
    Has OIDs: no
    
    discourse=#
    
  • (disco) in reply to PJH
    PJH:
    "idx_posts_user_id_deleted_at" btree (user_id) WHERE deleted_at IS NULL

    There's your issue. It's a partial index. Pg can't use that.

    Try adding p.deleted_at IS NULL to both queries next to the LIKE and see what happens.

  • (disco) in reply to hhaamu
    hhaamu:
    groups_users and groups tables seem pretty small (gu having a few thousand rows, groups fewer than a hundred?),
    discourse=# select count(*) from group_users;
     count
    -------
      3613
    (1 row)
    
    Time: 1.222 ms
    discourse=# select count(*) from groups;
     count
    -------
        16
    (1 row)
    
    Time: 0.324 ms
    discourse=#
    
    

    groups_users contains stuff like UserX is TL0, TL1 and TL2. For example, for me:

    discourse=# select g.name from group_users gu join groups g on g.id=gu.group_id where user_id=20;
            name
    --------------------
     admins
     staff
     trust_level_2
     trust_level_0
     trust_level_1
     trust_level_3
     area_gbr
     moderators
     super_sekret_group
    (9 rows)
    
    Time: 0.770 ms
    
    discourse=# select id, name, user_count, alias_level, visible from groups;
     id |        name         | user_count | alias_level | visible
    ----+---------------------+------------+-------------+---------
      0 | everyone            |          0 |           0 | t
     42 | area_gbr            |          7 |        NULL | t
     51 | area_bel            |          2 |           0 | t
     52 | super_sekret_group  |          5 |           0 | f
     41 | area_usa            |         10 |        NULL | t
     50 | bots                |         34 |           0 | t
     43 | area_deu            |          3 |           0 | t
     49 | programmers_testers |         30 |        NULL | t
      1 | admins              |         16 |           0 | t
      2 | moderators          |          9 |           2 | f
      3 | staff               |         18 |           0 | t
     10 | trust_level_0       |       1715 |           2 | f
     11 | trust_level_1       |       1215 |           0 | t
     12 | trust_level_2       |        473 |           0 | t
     13 | trust_level_3       |         72 |           0 | t
     14 | trust_level_4       |          5 |           0 | t
    (16 rows)
    
    Time: 0.450 ms
    
    
  • (disco) in reply to hhaamu
    discourse=# explain analyze SELECT p.user_id, p.id, p.topic_id, p.post_number
    discourse-# FROM posts p
    discourse-# JOIN group_users gu ON p.user_id = gu.user_id
    discourse-# JOIN groups g ON gu.group_id = g.id
    discourse-# WHERE raw LIKE '%[uuid omitted]%' AND
    discourse-# p.deleted_at IS NULL AND
    discourse-# g.name IN ('admins');
                                                                            QUERY PLAN
    ----------------------------------------------------------------------------------------------------------------------------------------------------------
     Nested Loop  (cost=0.00..24352.10 rows=6 width=16) (actual time=4.968..440.987 rows=7 loops=1)
       Join Filter: (gu.group_id = g.id)
       ->  Seq Scan on groups g  (cost=0.00..1.20 rows=1 width=4) (actual time=0.014..0.019 rows=1 loops=1)
             Filter: ((name)::text = 'admins'::text)
       ->  Nested Loop  (cost=0.00..24349.61 rows=103 width=20) (actual time=4.943..440.799 rows=93 loops=1)
             ->  Seq Scan on posts p  (cost=0.00..24233.28 rows=21 width=16) (actual time=4.924..439.995 rows=12 loops=1)
                   Filter: ((deleted_at IS NULL) AND (raw ~~ '%[uuid omitted]%'::text))
             ->  Index Scan using index_group_users_on_user_id on group_users gu  (cost=0.00..5.51 rows=2 width=8) (actual time=0.012..0.032 rows=8 loops=12)
                   Index Cond: (gu.user_id = p.user_id)
     Total runtime: 441.102 ms
    (10 rows)
    
    Time: 442.732 ms
    discourse=# SELECT p.user_id, p.id, p.topic_id, p.post_number
    discourse-# FROM posts p
    discourse-# JOIN group_users gu ON p.user_id = gu.user_id
    discourse-# JOIN groups g ON gu.group_id = g.id
    discourse-# WHERE raw LIKE '%[uuid omitted]%' AND
    discourse-# p.deleted_at IS NULL AND
    discourse-# g.name IN ('admins');
     user_id |   id   | topic_id | post_number
    ---------+--------+----------+-------------
          20 | 131276 |     4292 |           1
          20 | 136772 |     2929 |          17
          20 | 203068 |     6979 |        1186
          20 | 131160 |     1673 |        3634
          20 | 131125 |     1673 |        3632
          20 | 131170 |     3125 |        4585
          20 | 131167 |     1000 |       27836
    (7 rows)
    
    Time: 454.986 ms
    
    discourse=# explain analyze SELECT user_id, id, topic_id, post_number
    discourse-# FROM posts p
    discourse-# WHERE raw LIKE '%[uuid omitted]%' AND
    discourse-# p.deleted_at IS NULL AND
    discourse-# user_id IN  (
    discourse(# SELECT gu.user_id
    discourse(# FROM group_users gu
    discourse(# WHERE group_id IN(
    discourse(# SELECT g.id
    discourse(# FROM groups g
    discourse(# WHERE g.name IN ('admins')
    discourse(# )
    discourse(# );
                                                                            QUERY PLAN                                                                         
    -----------------------------------------------------------------------------------------------------------------------------------------------------------
     Nested Loop Semi Join  (cost=7.32..24347.55 rows=19 width=16) (actual time=5.313..433.193 rows=7 loops=1)
       Join Filter: (p.user_id = gu.user_id)
       ->  Seq Scan on posts p  (cost=0.00..24233.28 rows=21 width=16) (actual time=5.127..432.453 rows=12 loops=1)
             Filter: ((deleted_at IS NULL) AND (raw ~~ '%[uuid omitted]%'::text))
       ->  Materialize  (cost=7.32..38.96 rows=241 width=4) (actual time=0.010..0.035 rows=13 loops=12)
             ->  Nested Loop  (cost=7.32..37.75 rows=241 width=4) (actual time=0.089..0.169 rows=16 loops=1)
                   ->  HashAggregate  (cost=1.20..1.21 rows=1 width=4) (actual time=0.028..0.031 rows=1 loops=1)
                         ->  Seq Scan on groups g  (cost=0.00..1.20 rows=1 width=4) (actual time=0.014..0.016 rows=1 loops=1)
                               Filter: ((name)::text = 'admins'::text)
                   ->  Bitmap Heap Scan on group_users gu  (cost=6.12..33.53 rows=241 width=8) (actual time=0.048..0.079 rows=16 loops=1)
                         Recheck Cond: (gu.group_id = g.id)
                         ->  Bitmap Index Scan on index_group_users_on_group_id  (cost=0.00..6.06 rows=241 width=0) (actual time=0.038..0.038 rows=16 loops=1)
                               Index Cond: (gu.group_id = g.id)
     Total runtime: 433.317 ms
    (14 rows)
    
    Time: 434.701 ms
    discourse=# SELECT user_id, id, topic_id, post_number
    discourse-# FROM posts p
    discourse-# WHERE raw LIKE '%[uuid omitted]%' AND
    discourse-# p.deleted_at IS NULL AND
    discourse-# user_id IN  (
    discourse(# SELECT gu.user_id
    discourse(# FROM group_users gu
    discourse(# WHERE group_id IN(
    discourse(# SELECT g.id
    discourse(# FROM groups g
    discourse(# WHERE g.name IN ('admins')
    discourse(# )
    discourse(# );
     user_id |   id   | topic_id | post_number
    ---------+--------+----------+-------------
          20 | 131276 |     4292 |           1
          20 | 136772 |     2929 |          17
          20 | 203068 |     6979 |        1186
          20 | 131160 |     1673 |        3634
          20 | 131125 |     1673 |        3632
          20 | 131170 |     3125 |        4585
          20 | 131167 |     1000 |       27836
    (7 rows)
    
    Time: 417.335 ms
    discourse=#
    
  • (disco) in reply to PJH

    Seriously, what the hell is wrong with that query then. It's still not using the index, even though I'm sure it should. The runtime should then be well under 50 milliseconds when it does.

    If you're willing to make the posts (user_id) index non-partial on your test server, try it.

    You probably shouldn't even be making partial indices unless more than, say, 50 per cent of your rows are excluded by the condition. (I sure hope the users on this forum aren't deleting half of their posts.)

  • (disco) in reply to hhaamu
    hhaamu:
    If you're willing to make the posts (user_id) index non-partial on your test server, try it.

    I'm presuming there's no issue with having both - I didn't remove the other one..

    discourse=# \d+ posts
    <snip>
    Indexes:
        "posts_pkey" PRIMARY KEY, btree (id)
        "index_posts_on_topic_id_and_post_number" UNIQUE, btree (topic_id, post_number)
        "idx_posts_created_at_topic_id" btree (created_at, topic_id) WHERE deleted_at IS NULL
        "idx_posts_user_id" btree (user_id)
        "idx_posts_user_id_deleted_at" btree (user_id) WHERE deleted_at IS NULL
        "index_posts_on_reply_to_post_number" btree (reply_to_post_number)
    Has OIDs: no
    
    discourse=# explain analyze SELECT p.user_id, p.id, p.topic_id, p.post_number
    discourse-# FROM posts p
    discourse-# JOIN group_users gu ON p.user_id = gu.user_id
    discourse-# JOIN groups g ON gu.group_id = g.id
    discourse-# WHERE raw LIKE '%[uuid]%' AND
    discourse-# p.deleted_at IS NULL AND
    discourse-# g.name IN ('admins');
                                                                            QUERY PLAN
    ----------------------------------------------------------------------------------------------------------------------------------------------------------
     Nested Loop  (cost=0.00..24352.10 rows=6 width=16) (actual time=5.401..441.610 rows=7 loops=1)
       Join Filter: (gu.group_id = g.id)
       ->  Seq Scan on groups g  (cost=0.00..1.20 rows=1 width=4) (actual time=0.011..0.016 rows=1 loops=1)
             Filter: ((name)::text = 'admins'::text)
       ->  Nested Loop  (cost=0.00..24349.61 rows=103 width=20) (actual time=5.380..441.427 rows=93 loops=1)
             ->  Seq Scan on posts p  (cost=0.00..24233.28 rows=21 width=16) (actual time=5.349..440.544 rows=12 loops=1)
                   Filter: ((deleted_at IS NULL) AND (raw ~~ '%[uuid]%'::text))
             ->  Index Scan using index_group_users_on_user_id on group_users gu  (cost=0.00..5.51 rows=2 width=8) (actual time=0.015..0.043 rows=8 loops=12)
                   Index Cond: (gu.user_id = p.user_id)
     Total runtime: 441.695 ms
    (10 rows)
    
    Time: 443.248 ms
    discourse=# SELECT p.user_id, p.id, p.topic_id, p.post_number
    discourse-# FROM posts p
    discourse-# JOIN group_users gu ON p.user_id = gu.user_id
    discourse-# JOIN groups g ON gu.group_id = g.id
    discourse-# WHERE raw LIKE '%[uuid]%' AND
    discourse-# p.deleted_at IS NULL AND
    discourse-# g.name IN ('admins');
     user_id |   id   | topic_id | post_number
    ---------+--------+----------+-------------
          20 | 131276 |     4292 |           1
          20 | 136772 |     2929 |          17
          20 | 203068 |     6979 |        1186
          20 | 131160 |     1673 |        3634
          20 | 131125 |     1673 |        3632
          20 | 131170 |     3125 |        4585
          20 | 131167 |     1000 |       27836
    (7 rows)
    
    Time: 439.420 ms
    
    discourse=# explain analyze SELECT user_id, id, topic_id, post_number
    discourse-# FROM posts p
    discourse-# WHERE raw LIKE '%[uuid]%' AND
    discourse-# p.deleted_at IS NULL AND
    discourse-# user_id IN  (
    discourse(# SELECT gu.user_id
    discourse(# FROM group_users gu
    discourse(# WHERE group_id IN(
    discourse(# SELECT g.id
    discourse(# FROM groups g
    discourse(# WHERE g.name IN ('admins')
    discourse(# )
    discourse(# );
                                                                            QUERY PLAN
    -----------------------------------------------------------------------------------------------------------------------------------------------------------
     Nested Loop Semi Join  (cost=7.32..24347.55 rows=19 width=16) (actual time=5.104..419.205 rows=7 loops=1)
       Join Filter: (p.user_id = gu.user_id)
       ->  Seq Scan on posts p  (cost=0.00..24233.28 rows=21 width=16) (actual time=4.923..418.486 rows=12 loops=1)
             Filter: ((deleted_at IS NULL) AND (raw ~~ '%[uuid]%'::text))
       ->  Materialize  (cost=7.32..38.96 rows=241 width=4) (actual time=0.010..0.035 rows=13 loops=12)
             ->  Nested Loop  (cost=7.32..37.75 rows=241 width=4) (actual time=0.085..0.175 rows=16 loops=1)
                   ->  HashAggregate  (cost=1.20..1.21 rows=1 width=4) (actual time=0.026..0.029 rows=1 loops=1)
                         ->  Seq Scan on groups g  (cost=0.00..1.20 rows=1 width=4) (actual time=0.011..0.014 rows=1 loops=1)
                               Filter: ((name)::text = 'admins'::text)
                   ->  Bitmap Heap Scan on group_users gu  (cost=6.12..33.53 rows=241 width=8) (actual time=0.047..0.091 rows=16 loops=1)
                         Recheck Cond: (gu.group_id = g.id)
                         ->  Bitmap Index Scan on index_group_users_on_group_id  (cost=0.00..6.06 rows=241 width=0) (actual time=0.036..0.036 rows=16 loops=1)
                               Index Cond: (gu.group_id = g.id)
     Total runtime: 419.305 ms
    (14 rows)
    
    Time: 420.674 ms
    discourse=# SELECT user_id, id, topic_id, post_number
    FROM posts p
    WHERE raw LIKE '%[uuid]%' AND
    p.deleted_at IS NULL AND
    user_id IN  (
    SELECT gu.user_id
    FROM group_users gu
    WHERE group_id IN(
    SELECT g.id
    FROM groups g
    WHERE g.name IN ('admins')
    )
    );
     user_id |   id   | topic_id | post_number
    ---------+--------+----------+-------------
          20 | 131276 |     4292 |           1
          20 | 136772 |     2929 |          17
          20 | 203068 |     6979 |        1186
          20 | 131160 |     1673 |        3634
          20 | 131125 |     1673 |        3632
          20 | 131170 |     3125 |        4585
          20 | 131167 |     1000 |       27836
    (7 rows)
    
    Time: 415.375 ms
    
    hhaamu:
    (I sure hope the users on this forum aren't deleting half of their posts.)

    4.5%. 1 in every 23 posts still seems excessive...

    discourse=# select count(*) from posts;
     count
    --------
     212502
    (1 row)
    
    Time: 164.560 ms
    discourse=# select count(*) from posts where deleted_at IS NULL;
     count
    --------
     211539
    (1 row)
    
    Time: 193.552 ms
    discourse=# select count(*) from posts where deleted_at IS NOT NULL;
     count
    -------
       963
    (1 row)
    
    Time: 193.930 ms
    discourse=#
    
  • (disco) in reply to PJH
    PJH:
    4.5%. 1 in every 23 posts still seems excessive...

    Most of them are probably Paula every morning. But, yeah...that's more than I'd expect these days.

  • (disco) in reply to boomzilla
    boomzilla:
    Most of them are probably Paula every morning.

    Hmm - I don't remember deleting so many...

    discourse=# select u.username, count(*) from posts p join users u on u.id=p.user_id where deleted_at is not null group by u.username order by count(*) desc;
           username       | count
    ----------------------+-------
     PaulaBean            |   268
     SignatureGuy         |   151
     PJH                  |    38
                          |    23
                          |    21
                          |    18
                          |    16
                          |    14
                          |    13
                          |    13
                          |    13
                          |    12
                          |    12
                          |    11
                          |    11
                          |    11
                          |    11
                          |    10
                          |    10
                          |     9
                          |     7
                          |     7
                          |     6
     boomzilla            |     6
                          |     6
                          |     6
                          |     6
                          |     6
                          |     5
                          |     5
                          |     5
                          |     5
                          |     4
    
  • (disco) in reply to PJH

    :wtf:

    PJH:
    SignatureGuy | 151
  • (disco) in reply to PJH
    PJH:
    I'm presuming there's no issue with having both - I didn't remove the other one..

    No, shouldn't be an issue.

    This is beyond me. I have no further ideas. You can perhaps try the pgsql-performance mailing list if you're interested. I'd be very interested in knowing why it's seqscanning the posts table even though it's got everything it needs to not to.

    There's always the nuclear option of SET enable_seqscan TO off; which disables it for the session. Shirley the EXPLAINs should then use the index?

  • (disco) in reply to hhaamu

    Well as has been pointed out, for a whole query that takes < 2 seconds and runs at most once a minute, the whole issue is academic..

    I'm not too worried about it.

  • (disco) in reply to PJH

    I noticed you left the p.deleted_at IS NULL in the queries after you created the non-partial index; perhaps it's affecting the plan.

  • (disco) in reply to hhaamu
    hhaamu:
    I noticed you left the p.deleted_at IS NULL in the queries after you created the non-partial index; perhaps it's affecting the plan.

    Not a lot of difference - timings are comparable:

    discourse=# explain analyze SELECT p.user_id, p.id, p.topic_id, p.post_number
    discourse-# FROM posts p
    discourse-# JOIN group_users gu ON p.user_id = gu.user_id
    discourse-# JOIN groups g ON gu.group_id = g.id
    discourse-# WHERE raw LIKE '%[uuid]%' AND
    discourse-# g.name IN ('admins');
                                                                            QUERY PLAN                                                   
                         
    -------------------------------------------------------------------------------------------------------------------------------------
    ---------------------
     Nested Loop  (cost=0.00..24352.10 rows=6 width=16) (actual time=5.507..417.202 rows=7 loops=1)
       Join Filter: (gu.group_id = g.id)
       ->  Seq Scan on groups g  (cost=0.00..1.20 rows=1 width=4) (actual time=0.019..0.025 rows=1 loops=1)
             Filter: ((name)::text = 'admins'::text)
       ->  Nested Loop  (cost=0.00..24349.61 rows=103 width=20) (actual time=5.476..417.019 rows=93 loops=1)
             ->  Seq Scan on posts p  (cost=0.00..24233.28 rows=21 width=16) (actual time=5.456..416.148 rows=12 loops=1)
                   Filter: (raw ~~ '%[uuid]%'::text)
             ->  Index Scan using index_group_users_on_user_id on group_users gu  (cost=0.00..5.51 rows=2 width=8) (actual time=0.022..0.
    043 rows=8 loops=12)
                   Index Cond: (gu.user_id = p.user_id)
     Total runtime: 417.339 ms
    (10 rows)
    
    Time: 419.887 ms
    discourse=# explain analyze SELECT user_id, id, topic_id, post_number
    discourse-# FROM posts p
    discourse-# WHERE raw LIKE '%[uuid]%' AND
    discourse-# user_id IN  (
    discourse(# SELECT gu.user_id
    discourse(# FROM group_users gu
    discourse(# WHERE group_id IN(
    discourse(# SELECT g.id
    discourse(# FROM groups g
    discourse(# WHERE g.name IN ('admins')));
                                                                            QUERY PLAN
    
    -------------------------------------------------------------------------------------------------------------------------------------
    ----------------------
     Nested Loop Semi Join  (cost=7.32..24347.55 rows=19 width=16) (actual time=5.832..409.068 rows=7 loops=1)
       Join Filter: (p.user_id = gu.user_id)
       ->  Seq Scan on posts p  (cost=0.00..24233.28 rows=21 width=16) (actual time=5.667..408.357 rows=12 loops=1)
             Filter: (raw ~~ '%[uuid]%'::text)
       ->  Materialize  (cost=7.32..38.96 rows=241 width=4) (actual time=0.008..0.035 rows=13 loops=12)
             ->  Nested Loop  (cost=7.32..37.75 rows=241 width=4) (actual time=0.067..0.145 rows=16 loops=1)
                   ->  HashAggregate  (cost=1.20..1.21 rows=1 width=4) (actual time=0.028..0.030 rows=1 loops=1)
                         ->  Seq Scan on groups g  (cost=0.00..1.20 rows=1 width=4) (actual time=0.012..0.015 rows=1 loops=1)
                               Filter: ((name)::text = 'admins'::text)
                   ->  Bitmap Heap Scan on group_users gu  (cost=6.12..33.53 rows=241 width=8) (actual time=0.027..0.057 rows=16 loops=1)
                         Recheck Cond: (gu.group_id = g.id)
                         ->  Bitmap Index Scan on index_group_users_on_group_id  (cost=0.00..6.06 rows=241 width=0) (actual time=0.018..0
    .018 rows=16 loops=1)
                               Index Cond: (gu.group_id = g.id)
     Total runtime: 409.184 ms
    (14 rows)
    
    Time: 410.682 ms
    
  • (disco) in reply to PJH

    Try the SET enable_seqscan TO off and EXPLAIN the queries. If it's still not using the index on posts, there's something I'm missing.

  • (disco) in reply to hhaamu

    Well it's halved the time taken for your suggested query (213.444ms). And even slightly better for mine(202.261ms).

    I won't pretend to even remotely totally understand the query plans - reading up on that now...

    discourse=# set enable_seqscan = off;
    SET
    Time: 0.215 ms
    discourse=# explain analyze SELECT p.user_id, p.id, p.topic_id, p.post_number
    FROM posts p
    JOIN group_users gu ON p.user_id = gu.user_id
    JOIN groups g ON gu.group_id = g.id
    WHERE raw LIKE '%[uuid]%' AND
    g.name IN ('admins');
                                                                              QUERY PLAN
    
    -------------------------------------------------------------------------------------------------------------------------------------
    --------------------------
     Nested Loop  (cost=4.74..100066.66 rows=6 width=16) (actual time=100.559..212.093 rows=7 loops=1)
       ->  Nested Loop  (cost=0.00..286.34 rows=226 width=4) (actual time=16.670..16.827 rows=16 loops=1)
             ->  Index Scan using index_groups_on_name on groups g  (cost=0.00..8.27 rows=1 width=4) (actual time=16.647..16.650 rows=1 l
    oops=1)
                   Index Cond: ((name)::text = 'admins'::text)
             ->  Index Scan using index_group_users_on_group_id on group_users gu  (cost=0.00..275.06 rows=241 width=8) (actual time=0.01
    2..0.102 rows=16 loops=1)
                   Index Cond: (gu.group_id = g.id)
       ->  Bitmap Heap Scan on posts p  (cost=4.74..441.24 rows=21 width=16) (actual time=7.725..12.191 rows=0 loops=16)
             Recheck Cond: (p.user_id = gu.user_id)
             Filter: (p.raw ~~ '%[uuid]%'::text)
             ->  Bitmap Index Scan on idx_posts_user_id  (cost=0.00..4.74 rows=429 width=0) (actual time=0.246..0.246 rows=953 loops=16)
                   Index Cond: (p.user_id = gu.user_id)
     Total runtime: 212.189 ms
    (12 rows)
    
    Time: 213.845 ms
    discourse=# explain analyze SELECT user_id, id, topic_id, post_number
    FROM posts p
    WHERE raw LIKE '%[uuid]%' AND
    user_id IN  (
    SELECT gu.user_id
    FROM group_users gu
    WHERE group_id IN(
    SELECT g.id
    FROM groups g
    WHERE g.name IN ('admins')));
                                                                            QUERY PLAN
    
    -------------------------------------------------------------------------------------------------------------------------------------
    ----------------------
     Nested Loop  (cost=52.10..138569.39 rows=19 width=16) (actual time=23.368..186.358 rows=7 loops=1)
       ->  HashAggregate  (cost=45.42..47.83 rows=241 width=4) (actual time=0.175..0.227 rows=16 loops=1)
             ->  Nested Loop  (cost=14.39..44.82 rows=241 width=4) (actual time=0.068..0.144 rows=16 loops=1)
                   ->  HashAggregate  (cost=8.27..8.28 rows=1 width=4) (actual time=0.024..0.025 rows=1 loops=1)
                         ->  Index Scan using index_groups_on_name on groups g  (cost=0.00..8.27 rows=1 width=4) (actual time=0.012..0.01
    5 rows=1 loops=1)
                               Index Cond: ((name)::text = 'admins'::text)
                   ->  Bitmap Heap Scan on group_users gu  (cost=6.12..33.53 rows=241 width=8) (actual time=0.035..0.067 rows=16 loops=1)
                         Recheck Cond: (gu.group_id = g.id)
                         ->  Bitmap Index Scan on index_group_users_on_group_id  (cost=0.00..6.06 rows=241 width=0) (actual time=0.027..0
    .027 rows=16 loops=1)
                               Index Cond: (gu.group_id = g.id)
       ->  Bitmap Heap Scan on posts p  (cost=6.67..574.52 rows=21 width=16) (actual time=7.452..11.621 rows=0 loops=16)
             Recheck Cond: (p.user_id = gu.user_id)
             Filter: (p.raw ~~ '%[uuid]%'::text)
             ->  Bitmap Index Scan on idx_posts_user_id  (cost=0.00..6.67 rows=429 width=0) (actual time=0.232..0.232 rows=953 loops=16)
                   Index Cond: (p.user_id = gu.user_id)
     Total runtime: 186.497 ms
    (16 rows)
    
    Time: 188.304 ms
    discourse=#
    
    
  • (disco) in reply to Maciejasjmj
    Maciejasjmj:
    It kinda works for simple matches, but then you try something even a bit more complex and inevitably end up with what looks like the results of banging your head on your keyboard for a prolonged period of time.

    That would be your cue to put regular expressions away and get a real parser.

    FrostCat:
    Facebook is FULL of people who do that, or something similar, only with simpler problems like "solve 0 * 7 + 7 - 7"

    That's just one more reason to stay away from Facebook.

  • (disco) in reply to FrostCat
    FrostCat:
    0 * 7 + 7 - 7

    0


    2 4 6 4 8 16 8 10 24

    those three numbers follow a pattern rule. what is the rule?

  • (disco) in reply to PJH
    PJH:
    I won't pretend to even remotely totally understand the query plans - reading up on that now...

    They look okay now. The posts table part of the query is now using the index, as it should.

    Postgres thinks the nested loop variant takes 100k pagefetches [1] while the seqscan on posts takes only 24k. But it's not telling me why. The cost estimate just jumps to 100k ; its direct sub-items have costs of 286-ish and 441-ish.

    ...Why is it using the nested loop strategy? Postgres usually prefers hash joins, doesn't it?

    (Answering to self: maybe there are too few rows in the intermediate resultset for a hash join to be beneficial, or maybe the resultset isn't wide enough. You can disable the relevant enable_*, enable_nestloop in this case, and see what the planner thinks as a backup plan and how much it costs...)


    Anyway, the thing I was missing is now obvious, I think, was that you're searching all posts by yourself. Postgres says this will be [the amount of rows returned from g_u] * [average amount of posts per user]

    As it thinks it'll have to find 241 users' information (In reality:16 ids, which I think are the same? user_id = 20?). So 241 users * avg_postcount means it should have to do the posts (user_id) index look-up a sizeable percentage of the rows...

    On second thought, that can't be it. It's looking up fewer than a thousand rows. Half a per cent of the whole table. Perfectly reasonable to use an index then. So scrap this.


    [1]: Well, not really 'pagefetches'; it's measured on an arbitrary scale used by the planner. 'Planner unit cost' perhaps?

  • (disco) in reply to accalia

    Each row is the first number times the column number, but the middle column is being expressed in hexadecimal instead of decimal.

    :stuck_out_tongue:

  • (disco) in reply to powerlord

    ........ ok that does match, but no.....

    also i really shouldn't play guessing games here so......

    https://www.youtube.com/watch?v=vKA4w2O61Xo

  • (disco) in reply to accalia

    Can't watch it from work as I don't have headphones. I'll have to watch it later.

    Although I will say I tried a few different things when doing this, but either the 10 or the 16/24 seemed to block the various things I tried. So I figured I'd go with the tongue-in-cheek guess. :smiley:

  • (disco) in reply to boomzilla
    boomzilla:
    :wtf:
    PJH:
    SignatureGuy | 151

    It was for this experiment: http://what.thedailywtf.com/t/high-unread-count/3597

    @SignatureGuy was able to successfuly generate around 3000 posts but was only able to remove 150 of them before running into some DiscoLimits.

  • (disco) in reply to VinDuv

    I agree with whatever @VinDuv posted just above.<t7649p89>

  • (disco) in reply to accalia
    accalia:
    0

    2 4 6 4 8 16 8 10 24

    those three numbers follow a pattern rule. what is the rule?

    You posted them.

    For the database discussion, isn't this why you can move posts?

  • (disco) in reply to chubertdev
    chubertdev:
    , isn't this why you can move posts?

    I can move posts?

    News to me!

  • (disco) in reply to accalia
    accalia:
    I can move posts?

    News to me!

    heh

    chubertdev:
    For the database discussion
  • (disco) in reply to Maciejasjmj
    Maciejasjmj:
    We should've moved to RPN a long time ago, I say

    We did, but then the idiocracy dragged us back again.

  • (disco) in reply to PJH

    I would wonder if it's the join order that's confusing the query engine. The nested IN statements kind of controls the JOIN order, and the subquery WHERE clauses act like JOIN conditions, while the alternative query does all the JOINs then filters with the WHERE clause.

    SELECT p.user_id, p.id, p.topic_id, p.post_number
    FROM posts p
    JOIN (group_users gu 
        JOIN groups g ON gu.group_id = g.id)
        ON p.user_id = gu.user_id
    WHERE raw LIKE '%[uuid]%' 
        AND g.name IN ('admins');
    

    Or even:

    SELECT p.user_id, p.id, p.topic_id, p.post_number
    FROM (groups g
        JOIN group_users gu
            ON  gu.group_id = g.id
            AND g.name IN ('admin'))
    JOIN posts p
        ON p.user_id = gu.user_id
    WHERE p.raw LIKE '%[uuid]%' 
    

    I also wonder how this performs, as EXISTS can outperform IN, although it's only usual to do so when the subquery is very large:

    SELECT p.user_id, p.id, p.topic_id, p.post_number
    FROM posts p
    WHERE EXISTS (SELECT 1 
        FROM groups g
        JOIN group_users gu
            ON gu.group_id = g.id
        WHERE g.name IN ('admin')
            AND gu.user_id = p.user_id)
    

    Theoretically, of course, all these queries should have the same query plan if the query planner is smart enough.

  • (disco) in reply to Maciejasjmj

    Regexes are worth taking the time to understand. It's not some innate ability that people who read them have, its simply taking the time to learn them.

    Yes, they can be abused (they really shouldn't be hundreds of characters long). If it fits on a line, any professional should be able to figure it out.

    I suppose I translate things in my mind too much, but I see "COMPLICATED BAD. GROG NOT UNDERSTAND. GROG ANGRY." in your post.

  • (disco) in reply to coyo
    coyo:
    Regexes are worth taking the time to understand.

    Maybe they are, I'm not denying it. But the point is, they wouldn't take nearly as much time to understand if somebody didn't seriously, badly, absolutely fucked up the syntax.

    coyo:
    I suppose I translate things in my mind too much, but I see "COMPLICATED BAD. GROG NOT UNDERSTAND. GROG ANGRY." in your post.

    Complicated is bad. If you can have the same thing, but simpler, you'd obviously choose that over the more complicated solution.

  • (disco) in reply to Maciejasjmj
    Maciejasjmj:
    Complicated is bad. If you can have the same thing, but simpler, you'd obviously choose that over the more complicated solution.

    I agree. I have always avoided regexes if there was a simpler, more understandable way of doing things. But then I have never felt the need to obscure my code as an exercise in job protection.

    One of my least favorite programmers used to use regexes to validate input numeric dates. In Java. When it was pointed out to him that February has 28 or 29 days depending on the yearnumber, he eventually produced an eyewatering regex to handle it (it will fail in 2100). Mind you, he was also one of those fedora wearing people who believe that Exceptions are bad, women can't program, and customers are always wrong.

  • (disco) in reply to kupfernigk
    kupfernigk:
    I have always avoided regexes if there was a simpler, more understandable way of doing things.

    They're good for what they're designed to do: describing matchers for Regular Languages. It's when people use them to do other things that the WTFs mount up.

    Thus, if you were building a date parser from scratch, you'd use REs as one of the first steps, to extract the interesting fields (tokenizers are big RE use cases). It wouldn't tell you whether the date was valid, but it would let you get rid of a lot of obviously invalid stuff rapidly. Of course, most languages have a library (possibly built-in) that handles all that stuff for you and it would indicate a WTF to not use it — though which WTF it indicates isn't always simple — but that library call may well have been designed to use REs internally.

  • (disco) in reply to dkf
    dkf:
    but that library call may well have been designed to use REs internally.

    Exactly. And that library has been produced by more than one person, thoroughly reviewed and tested. So this is one wheel nobody should be reinventing to show off their 1337 regex skills. (Deliberate use of obsolescent slang for emphasis.)

  • (disco) in reply to PJH
    PJH:
    4.5[s]%[/s] **‰** . 1 in every [s]23[/s] **221** posts still seems **mildly** excessive...

    FTFY 0.45% (inb4 WTF is ‰)

  • (disco) in reply to kupfernigk
    kupfernigk:
    Mind you, he was also one of those fedora wearing people who believe that Exceptions are bad, women can't program, and customers are always wrong.

    Well, 2 out of 3 correct is not that bad. </troll> <Also I'll let you guess my thoughts on which one of those three is wrong.>

  • (disco) in reply to Stuart_Jones

    It's a memorial to the brave souls who attempt to clean up this purposefully obfuscated mess.

Leave a comment on “A Shining Perl”

Log In or post as a guest

Replying to comment #:

« Return to Article