[Notice] Deployed an SQL optimization. Decreased about 97% execution time.
Here is the detail of changing:
- https://github.com/steemit/hivemind/pull/352/commits/ec08ab52003426837e63602a7f40868a5185332f
- https://github.com/steemit/hivemind/pull/352/commits/5f20fbf77b9aa7840d3ec192b612957f6ae13aa4
Why
- Optimize the
hive_posts_ix4
index to avoid table re-access when reading author data. - Use
NOT EXISTS
to replaceNOT IN
. In PostgreSQL,NOT EXISTS
is more efficient thanNOT IN
.
Before:
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=306.32..330.92 rows=46 width=36) (actual time=4.487..4.488 rows=1 loops=1)
Group Key: hive_posts.parent_id
-> Index Scan using hive_posts_ix8 on hive_posts (cost=306.32..330.11 rows=46 width=8) (actual time=4.461..4.476 rows=10 loops=1)
Index Cond: (parent_id = 112041412)
Filter: ((NOT is_deleted) AND (NOT (hashed SubPlan 1)))
SubPlan 1
-> Seq Scan on hive_posts_status (cost=0.00..275.05 rows=12282 width=8) (actual time=0.006..1.612 rows=12274 loops=1)
Filter: (list_type = '3'::smallint)
Rows Removed by Filter: 1490
Planning Time: 4.241 ms
Execution Time: 4.607 ms
(11 rows)
After:
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=0.85..252.71 rows=53 width=36) (actual time=0.081..0.081 rows=1 loops=1)
Group Key: p.parent_id
-> Nested Loop Anti Join (cost=0.85..251.78 rows=53 width=8) (actual time=0.025..0.076 rows=11 loops=1)
-> Index Only Scan using hive_posts_ix4_optimized on hive_posts p (cost=0.57..11.96 rows=90 width=18) (actual time=0.013..0.017 rows=11 loops=1)
Index Cond: (parent_id = 112041412)
Heap Fetches: 3
-> Index Only Scan using idx_hive_posts_status_list_type_author on hive_posts_status s (cost=0.29..2.66 rows=1 width=8) (actual time=0.005..0.005 rows=0 loops=11)
Index Cond: ((list_type = '3'::smallint) AND (author = (p.author)::text))
Heap Fetches: 0
Planning Time: 6.994 ms
Execution Time: 0.105 ms
(11 rows)
Result
We can find that the execution time from 4.607 ms to 0.105 ms, decreased about 97% execution time.
The Seq Scan
is avoided. That would be the reason why we can see a long time SQL on hive_posts_status
table, when the large requests come.
Index Only Scan
shows that the author
data is acquired in one query, not re-access table.
Next, I will optimize another slow SQLs in this Pull Request. Because merging code is really not an easy thing.
Feel free to any issue.
Are you aware that steemdb.io has been unreachable for a while - maybe a week or more?
0.00 SBD,
0.09 STEEM,
0.09 SP