[Notice] Deployed an SQL optimization. Decreased about 97% execution time.

in #steem5 days ago (edited)

Here is the detail of changing:

Why

  • Optimize the hive_posts_ix4 index to avoid table re-access when reading author data.
  • Use NOT EXISTS to replace NOT IN. In PostgreSQL, NOT EXISTS is more efficient than NOT IN.

Before:

QUERY PLAN                                                               
---------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=306.32..330.92 rows=46 width=36) (actual time=4.487..4.488 rows=1 loops=1)
   Group Key: hive_posts.parent_id
   ->  Index Scan using hive_posts_ix8 on hive_posts  (cost=306.32..330.11 rows=46 width=8) (actual time=4.461..4.476 rows=10 loops=1)
         Index Cond: (parent_id = 112041412)
         Filter: ((NOT is_deleted) AND (NOT (hashed SubPlan 1)))
         SubPlan 1
           ->  Seq Scan on hive_posts_status  (cost=0.00..275.05 rows=12282 width=8) (actual time=0.006..1.612 rows=12274 loops=1)
                 Filter: (list_type = '3'::smallint)
                 Rows Removed by Filter: 1490
 Planning Time: 4.241 ms
 Execution Time: 4.607 ms
(11 rows)

After:

QUERY PLAN                                                                          
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=0.85..252.71 rows=53 width=36) (actual time=0.081..0.081 rows=1 loops=1)
   Group Key: p.parent_id
   ->  Nested Loop Anti Join  (cost=0.85..251.78 rows=53 width=8) (actual time=0.025..0.076 rows=11 loops=1)
         ->  Index Only Scan using hive_posts_ix4_optimized on hive_posts p  (cost=0.57..11.96 rows=90 width=18) (actual time=0.013..0.017 rows=11 loops=1)
               Index Cond: (parent_id = 112041412)
               Heap Fetches: 3
         ->  Index Only Scan using idx_hive_posts_status_list_type_author on hive_posts_status s  (cost=0.29..2.66 rows=1 width=8) (actual time=0.005..0.005 rows=0 loops=11)
               Index Cond: ((list_type = '3'::smallint) AND (author = (p.author)::text))
               Heap Fetches: 0
 Planning Time: 6.994 ms
 Execution Time: 0.105 ms
(11 rows)

Result

We can find that the execution time from 4.607 ms to 0.105 ms, decreased about 97% execution time.

The Seq Scan is avoided. That would be the reason why we can see a long time SQL on hive_posts_status table, when the large requests come.

Index Only Scan shows that the author data is acquired in one query, not re-access table.

Next, I will optimize another slow SQLs in this Pull Request. Because merging code is really not an easy thing.

Feel free to any issue.

Sort:  

Are you aware that steemdb.io has been unreachable for a while - maybe a week or more?

504 Gateway Time-out